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Abstract 

Many  VLSI  circuit  designs  are  too  large  to  be  simulated  with  VHDL  in  a  reasonable  amount 
of  time.  One  approach  to  reducing  the  simulation  time  is  to  distribute  the  simulation  over  several 
processors.  This  research  creates  an  environment  for  designing  and  simulating  structural  VHDL 
circuits  on  the  Intel  iPSC/2  and  iPSC/860  Hypercubes.  Logic  gates  and  system  behaviors  are 
partitioned  among  the  processors,  and  signal  changes  are  shared  via  event  messages.  Circuit  simu¬ 
lations  are  run  over  the  SPECTRUM  parallel  simulation  testbed,  and  the  null-message  paradigm  is 
used  to  avoid  deadlock.  Structural  circuits  ranging  from  forty  to  over  one  thousand  logic  gates  are 
correctly  simulated.  Although  no  attempt  is  made  to  find  optimal  partitioning  strategies,  speedups 
are  obtained  for  some  configurations. 
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PARALLEL  SIMULATION  OF 
STRUCTURAL  VHDL  CIRCUITS  ON 
INTEL  HYPERCUBES 


I.  Introduction 


1.1  Background. 

Advances  in  Very  Large  Scale  Integrated  (VLSI)  circuit  technology  increase  the  transistor 
count  on  a  chip  by  about  25%  per  year,  doubling  every  three  years  (17:17).  In  order  to  efficiently 
design  increasingly  complex  VLSI  circuits,  designers  use  simulation  tools  to  validate  their  circuits 
prior  to  fabrication.  In  1979,  the  Department  of  Defense  (DOD)  started  the  Very  High  Speed 
Integrated  Circuit  (VHSIC)  program  to  employ  the  use  of  high  density  VLSI  circuits  in  military 
systems.  The  VHSIC  Hardware  Description  Language  (VHDL)  program  began  in  1983  to  stan¬ 
dardize  the  tools  needed  to  efficiently  design  and  test  these  circuits  (13,  22). 

Many  circuit  designs  are  too  complex  to  be  simulated  with  VHDL  in  a  reasonable  amount 
of  time.  In  an  effort  to  improve  VHDL’s  performance,  the  Defense  Advanced  Research  Projects 
Agency  (DARPA)  has  sponsored  the  QUEST  project,  whose  goal  is  a  thousand-fold  speed-up 
in  VHDL  simulation  (28:1-1).  One  approach  to  reducing  the  simulation  time  is  to  distribute  the 
simulation  of  the  design  over  several  processors.  If  VHDL’s  capabilities  could  be  effectively  mapped 
to  a  parallel  processor,  the  simulation  would  be  faster  and  users  could  design  and  run  more  complex 
circuits.  Efforts  at  AFIT  have  centered  on  creating  a  parallel  implementation  of  VHDL  for  this 
purpose. 

In  1991  AFIT  investigated  the  data  structures  of  Intermetrics’  sequential  VHDL  simulator 
and  demonstrated  a  way  to  intercept  intermediate  C  code  from  Intermetrics’  compiler,  transform  it, 
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and  run  parallel  simulations  on  the  Intel  iPSC/2  Hypercube  (10).  This  research  effort  composes  the 
tools  necessary  to  create  and  run  structural  VHDL  simulations  on  the  Intel  iPSC/2  and  iPSC/860 
Hypercubes. 

1.2  Problem  Statement. 

AFIT  has  investigated  implementing  a  parallel  VHDL  simulator  to  decrease  the  simulation 
times  of  VLSI  circuits;  however,  an  automated  method  for  creating  parallel  VHDL  circuit  descrip¬ 
tions,  a  correct  parallel  simulator,  and  a  common  distributed  testbed  are  necessary  to  generate  and 
simulate  large  VHDL  circuit  models. 

1.3  Research  Objectives. 

The  main  objective  of  this  thesis  is  to  demonstrate  and  test  the  capability  of  mapping  large 
sequential  VHDL  circuit  descriptions  to  distributed  processing  systems.  The  main  goals  are  to 

•  automate  the  procedures  for  generating  hierarchical,  structural  VHDL  models. 

•  create  a  VHDL  simulator  that  correctly  simulates  structural  VHDL  circuit  descriptions  and 
is  flexible  enough  to  partition  simulations  among  the  processors  of  a  distributed  system. 

•  provide  a  common  testbed  to  facilitate  experimentation  with  parallel  simulation  protocols 
and  investigation  into  optimizing  circuit  partitioning  strategies. 

•  demonstrate  the  simulator  with  several  VHDL  models. 

•  determine  if  speedup  can  be  achieved  through  the  use  of  parallel  simulations. 
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1.4  Assumptions. 


Comeau  did  the  preliminary  research  into  transforming  Intermetrics’  VHDL  models  into  mod¬ 
els  that  can  be  simulated  in  a  parallel  environment.  The  following  assumptions  build  upon  Comeau ’s 
research  (10:1-3): 

•  While  strict  meanings  of  “parallel”  and  “distributed”  processing  systems  vary  from  source  to 
source,  AFIT  has  generally  accepted  “parallel  processing”  to  indicate  processing  on  a  single 
computer  composed  of  multiple  processors,  while  “distributed  processing”  refers  to  processing 
among  several  “independent”  computers  across  a  network.  Nonetheless,  as  with  Comeau’s 
thesis,  this  research  uses  the  terms  “parallel”  and  “distributed”  interchangeably  throughout. 

•  The  parallel  computers  used  for  development  and  research  are  the  Intel  iPSC/2  and  iPSC/860 
Hypercubes. 

•  Source  code  is  written  in  the  standard  C  programming  language  (non-ANSI). 

•  To  further  research  efforts  for  both  DARPA  and  AFIT  and  stay  consistent  with  the  AFIT 
environment,  the  Chandy-Misra  conservative  synchronization  algorithm  for  event-driven  sim¬ 
ulations  is  used.  In  this  thesis,  the  null-message  protocol  is  implemented  via  the  use  of  a 
parallel  simulation  environment  known  as  SPECTRUM  (Simulation  Protocol  Evaluation  on 
a  Current  Testbed  using  Reusable  Modules)  (32). 

•  The  output  from  the  analyze,  model  generate,  and  build  phases  of  the  Intermetrics  VHDL 
compiler  are  correct  and  accessible. 

•  The  VHDL  test  cases  are  within  the  VHDL  subset  that  is  used  to  demonstrate  parallelized 
VHDL. 

•  VHDL  source  code  is  compiled  and  model  generated  in  Intermetrics  VHDL,  Version  2.1, 
September  1990. 
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1.5  Scope. 


Comeau  outlined  ten  steps  to  transform  Intermetrics’  intermediate  C  code  into  modules  that 
can  run  on  a  parallel  VHDL  simulator  (10:4-6).  These  operations  are  automated,  and  new  steps 
are  added  to  reduce  unnecessary  function  calls  and  enhance  simulator  capabilities. 

A  new  parallel  simulator,  VSIM,  is  written.  The  concepts  for  VHDL  simulation  are  taken 
from  Intermetrics’  simulator,  and  from  Comeau’s  parallel  VHDL  simulator  called  PVSIM.  The  par¬ 
allelization  of  VSIM  is  accomplished  with  minimal  changes  to  the  application  by  utilizing  SPEC¬ 
TRUM. 

The  parallel  simulation  protocol  is  implemented  using  SPECTRUM  “filters.”  This  provides 
a  level  of  modularity  that  aids  future  experimentation  with  new  protocols  and  instrumentation. 

Various  circuits  are  implemented  and  tested.  Also,  feedback  among  LPs  is  demonstrated. 

1.6  Limitations. 

1.6.1  VHDL  Source  Code  Limitations  for  VSIM.  The  subset  of  circuits  that  can  be  simu¬ 
lated  with  VSIM  includes  structural  descriptions  of  logic  gates  and  other  simple  processes.  Circuits 
are  created  the  same  way  as  for  Intermetrics’  circuits,  with  the  following  limitations: 

Signals  can  be  bits  or  bit-vectors;  however,  bit-vector  inputs  must  be  described  one  bit  at  a 
time,  e.g.,  Bus(O)  <=  *1*  after  10  ns;. 

Processes  should  be  one-line  descriptions  (Outl  <=  Ini  AID  In2  after  gate_delay ;);  however, 
multiline  processes — delimited  by  begin  and  end  process  may  be  used  provided  they  either  wait 
on  all  signals,  or  the  process  only  executes  once.  For  example,  if  a  process  has  input  signals  a,  b, 
and  c,  then  the  following  process  declarations  are  acceptable: 

process 

begin 
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wait  on  a,  b,  c; 

—  process  description  here 
end  process; 


process(a,btc) 

begin 

—  process  description  here 
end  process; 


process 

begin 

—  process  description  here 
wait;  —  that  is,  wait  indefinitely 
end  process; 


It  is  uncertain  how  functions  and  procedures  may  act  in  VSIM.  For  example,  functions  to 
describe  multi-valued  logic — or  bus  resolution — have  not  been  implemented  or  tested. 

As  in  the  case  with  functions,  VHDL  attributes,  “buffer”  ports,  file  I/O,  etc.,  have  not  been 
implemented  or  tested. 

1.6.2  Postprocessor  Limitations.  A  postprocessor,  pbnild,  is  designed  to  transform  Inter¬ 
metrics  generated  intermediate  C  code  for  parallel  simulation  with  VSIM.  Therefore,  the  postpro¬ 
cessor  only  works  for  Intermetrics-generated  intermediate  C  code. 

The  postprocessor  depends  heavily  on  recognizing  unique  patterns  in  the  intermediate  C  code. 
This  is  accomplished  using  lex,  a  UNIX-based  lexical  analyzer.  If  future  enhancements  are  to  be 
made  to  the  postprocessor,  or  if  the  subset  of  VHDL  circuits  is  to  be  expanded,  each  step  of  the 
postprocessor  should  be  re-evaluated  for  possible  impact. 
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1.6.3  VSIM  limitations.  The  user  must  first  run  the  parallel  simulator  on  one  node  to 
identify  behavior  id’s.1  This  is  accomplished  by  enabling  a  “MAPPIIG”  definition  in  the  simulator. 
Only  then  can  a  circuit-to-process  mapping  be  defined. 

Circuit  partitioning  must  be  done  “by  hand,’-  i.e.,  the  user  creates  the  appropriate  files  to 
define  logical  process  (LP)  relationships  and  behavior-to-LP  assignments. 

The  “receive  message”  filter  used  with  SPECTRUM  is  based  on  a  current  filter  called  “chan- 
clocks.”  However,  the  new  filter  is  modified  to  have  access  to  the  local  LP’s  next  event  time 
in  VSIM;  therefore,  the  protocol  (in  the  SPECTRUM  filter)  is  modified  in  an  application  specific 
manner. 

When  OUTPUT  is  defined  in  VSIM,  every  signal  change  is  reported.  This  becomes  a  bottleneck 
in  parallel  simulations  on  Intel  Hypercubes,  as  processors  contend  for  common  resources,  e.g.,  the 
host  operating  system  and  the  disk  drives. 

1.7  Thesis  Overview. 

Chapter  2  analyzes  the  current  research  efforts  in  parallel  discrete-event  digital  simulation  and 
how  they  relate  to  this  thesis.  Also,  other  efforts  in  parallel  VHDL  simulation  are  reviewed.  Chapter 
3  provides  the  methodology  for  implementation  of  the  post-processor,  the  parallel  VHDL  simulator, 
and  enhancements  to  the  parallel  VHDL  environment.  Implementation  of  this  methodology  is 
discussed  in  Chapter  4.  Chapter  5  discusses  the  research  findings  and  results.  Finally,  conclusions 
and  recommendations  for  further  research  are  included  in  Chapter  6. 

In  addition,  the  following  appendices  are  included: 

•  Appendix  A:  Definitions. 

1 A  “behavior”  is  an  executable  process  representing  a  VHDL  logic  gate  or  other  simple  process. 
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•  Appendix  B.  AFIT  Parallel  VHDL  User’s  Guide.  Documentation  on  how  to  prepare  and  run 
VHDL  descriptions  in  the  parallel  processing  environment.  Also,  a  test  case  is  demonstrated — 
using  an  edge-triggered  D  flip-flop. 

•  Appendix  C:  Subset  of  VHDL  Source  Code  for  Parallel  Simulation.  Describes,  with  examples, 
the  subset  and  syntax  for  VHDL  source  that  can  be  simulated  with  the  parallel  VHDL 
simulator. 

•  Appendix  D:  Design  of  the  Wallace  Tree  Multiplier. 

•  Appendix  E:  Summary  of  Performance  Data. 

1.8  Summary. 

VHDL  models  are  executed  sequentially  in  current  commercial  simulators.  As  chip  designs 
grow  larger  and  more  complex,  simulations  must  run  faster.  One  approach  to  increasing  simulation 
speed  is  through  parallel  processing.  This  research  transforms  the  hierarchical  structural  models 
created  by  Intermetrics''  sequential  VHDL  simulator  into  models  for  parallel  execution  on  the  Intel 
iPSC/2  and  iPSC/860  Hypercubes. 
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II.  Background 


2.1  Overview. 

In  this  chapter,  several  simulation  techniques  are  discussed,  including  traditional  simulation 
techniques  on  sequential  machines,  distributed  simulation  techniques,  and  digital  logic  simulation. 
Also,  previous  attempts  to  parallelize  VHDL  are  reviewed. 

2.2  Traditional  Simulation. 

Many  real-world  systems  can  be  modeled  and  simulated,  using  computers,  to  study  their 
behavior  under  various  conditions.  Examples  of  simulators  include  battlefield  simulators,  flight 
simulators,  simulations  of  factory  assembly  lines,  electronic  circuit  simulations,  etc.  In  continuous 
simulations,  the  state  of  the  model  may  change  continuously  over  time.  Discrete-event  simulations 
are  used  to  model  processes  whose  states  change  discretely  at  specified  points  in  time,  as  shown  in 
Figure  1  (27).  Continuous  systems,  like  digital  circuits,  may  also  be  modeled  with  discrete-event 
simulations. 

Sequential  simulators  usually  utilize  three  data  structures  (15): 

1.  The  state  variables  which  describe  the  state  of  the  system. 

2.  An  event  list  which  contains  the  schedule  of  all  future  events. 

3.  A  global  clock  variable  to  maintain  the  simulation  time. 

There  are  two  main  methods  of  implementing  discrete  simulations — time-driven  simulations 
and  event-driven  simulations.  In  time-driven  simulations,  the  globed  simulation  clock  is  used  to 
advance  the  simulation  uniformly  through  time.  With  respect  to  digital  circuits,  the  time-driven 
approach  is  not  very  efficient.  If  a  circuit  is  in  a  quiescent  state  for  a  long  period  of  time,  waiting  for 
the  clock  to  advance  becomes  time  consuming  and  reduces  performance.  In  event-driven  simulation, 
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Figure  1.  Response  Measurement  from  a  Discrete- Event  Simulator  (27) 

processes  schedule  their  outputs  on  a  global  event  list.  Then  the  simulation  clock  can  advance  from 
one  event  time  to  the  next,  since  no  computations  need  to  occur  between  event  times  (26:136-137). 

Given  this  introduction  to  traditional  discrete- event  simulation,  techniques  for  distributed 
simulation  can  be  reviewed. 

2.3  Distributed  Simulation. 

With  traditional  techniques  on  sequential  processors,  large  simulations  in  engineering,  meteo¬ 
rology,  military  applications,  and  circuit  design,  to  name  a  few,  consume  large  amounts  of  time  (15). 
Parallel  discrete-event  simulation,  or  distributed  simulation,  refers  to  the  execution  of  a  simulation 
on  a  number  of  processors.  Ideal  candidates  for  distributed  simulation  are  systems  whose  phys¬ 
ical  processes  (PPs)  execute  concurrently  and  can  be  modeled  by  message  passing  among  their 
corresponding  logical  processes  (LPs)  (7:198-199).  Electronic  circuit  systems  can  be  simulated  in 
this  way,  where  the  LPs  representing  the  components,  or  groups  of  components,  that  make  up  the 
circuit  are  partitioned  among  the  processors.  Hence,  the  time  required  to  complete  a  simulation 
should  decrease  since  computations  are  executed  in  parallel  (5:11). 
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2.3.1  General  Performance  Model.  The  use  of  a  global  clock  in  distributed  simulation  con¬ 
stitutes  a  bottleneck  because  the  LPs  would  all  operate  in  lock -step.  At  any  global  time  t,  a  number 
of  LPs  may  have  nothing  to  do.  In  asynchronous  models,  however,  each  LP  contains  a  local  vir¬ 
tual  time  (LVT),  and  the  LPs  are  allowed  to  progress  at  irregular  intervals.  In  most  models,  LPs 
communicate  via  time-stamped  messages  in  the  form  of  tuples,  (tt,mt),  where  m*  is  the  message 
sent  at  LVT  t*  (7:199).  The  specific  rules  for  message  passing  depend  on  the  particular  protocol. 

A  global  event-list  would  also  be  a  bottleneck  in  distributed  simulation.  Therefore,  each 
LP  usually  maintains  its  own  event-list,  or  queue.  Events  either  received  or  self- generated  can  be 
scheduled  in  the  local  event-list,  if  necessary,  as  well  as  sent  to  “downstream”  LPs,  as  required  by 
the  model  (7:198). 

2.3.2  Speed-Up  and  Efficiency  of  Distributed  Simulations.  If  the  simulation  time  for  p  pro¬ 
cessors  is  Tp,  and  the  time  for  the  same  simulation  on  one  processor  is  T\,  then  the  speed-up  of  the 
distributed  simulation  is  Ti/Tp.  An  ideal  speed-up  would  be  p.  The  efficiency  of  the  simulation 
is  therefore  the  speed-up  divided  by  p.  The  efficiency  indicates  how  much  the  communications 
overhead,  time- management,  amount  of  concurrency,  and  load  imbalance  among  LPs  deters  the 
overall  speed-up  (29:43)  (14). 

2.3.3  Distributed  Simulation  Protocols.  Asynchronous  simulation  protocols  can  be  loosely 
classified  as  either  conservative  or  optimistic.  Conservative  protocols  allow  an  LP  to  advance  its 
LVT  only  when  it  is  absolutely  certain  it  cannot  receive  an  event  with  a  time-stamp  less  than  the 
new  LVT.  Optimistic  protocols  allow  each  LP  to  proceed  at  its  own  pace  even  though  events  may 
arrive  out  of  the  past.  Time  Warp  corrects  out  of  order  messages  by  rolling  back,  i.e.,  restoring 
its  state  to  a  time  prior  to  the  actual  message  time  and  then  recomputing  forward.  Therefore, 
optimistic  protocols  require  state  saving  capabilities  for  each  LP. 
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2.S.S.1  Consertalitic  Distributed  Simulation  Protocols.  Chandy  and  Misra  have  pro¬ 
posed  an  asynchronous  conservative  protocol  where  each  LP  manages  its  LVT  and  event-list  as 
follows: 

An  LP  simulates  the  corresponding  PP  in  the  following  manner.  Let  the  sequence  of 

messages  sent  by  LPi  to  LPj  be  (fi,mi),  (f2,  m2), (<3,  m3),  - . .  We  require  that 

1.  0  <  ti  <  <2  <  <3  -  -  (monotonicity)  and 

2.  PPi  must  have  sent  message  m*  to  PPj  at  time  ft,  it  =  1, 2, 3, . . .  and 

3.  PPi  must  have  sent  no  other  messages  to  PPj  besides  mi,  m2, . . . ,  mt, . . . ,  i.e.,  the 
sequence  of  messages  sent  by  an  LP  must  correspond  exactly  to  the  actual  sequence 
of  messages  sent  by  the  corresponding  PP.  During  the  course  of  the  simulation,  if 
LPi  sends  LPj  a  message  (ft, mt)  it  implies  that  all  messages  from  PPi  to  PPj 
have  been  simulated  up  to  time  ft.  (7:199) 

This  model  requires  static  allocation  of  processes,  i.e.,  the  distribution  of  LPs  among  the 
processors  is  fixed,  and  the  communication  paths  among  the  LPs  is  known  prior  to  simulation  (9). 
Digital  circuit  simulations,  including  VHDL,  conform  to  this  assumption.  This  model  also  assumes 
no  buffering  of  messages,  so  a  sending  LP  must  wait  for  all  downstream  LPs  to  receive  a  message 
before  it  can  progress.  Also,  an  LP  must  wait  for  messages  from  upstream  LPs  whose  clock  values 
are  equal  to  its  LVT  (7:200). 

Misra  shows  that  given  this  protocol,  deadlock  can  occur  in  two  different  ways  (25:55).  Con¬ 
sider  the  simple  model  of  Figure  2.  Suppose  for  every  message  sent  by  LPO,  LPI  generates  a 
message  and  only  sends  it  to  LP2.  Then  LP4  never  receives  a  message  from  LP3,  because  LP3’s 
LVT  is  still  at  0.  Therefore,  the  LVTs  of  LP4  and  LP5  each  remain  at  0.  The  other  situation  that 
can  cause  deadlock  is  cyclic  waiting,  as  shown  in  Figure  3.  The  numbers  on  each  arc  correspond 
to  the  time-stamp  of  the  last  message  sent.  None  of  the  LPs  send  a  message  without  receiving  one 
first,  i.e.,  they  don’t  predict  future  messages.  LP2  has  received  a  message  at  f  =  20  and  advanced 
its  LVT  to  20,  and  hasn’t  generated  a  corresponding  output  message  (this  particular  message  was 
consumed).  So,  LPI  is  waiting  at  t  =  20  to  receive  a  message  from  LP3,  and  LP3  is  waiting  at 
t  =  15  for  a  message  from  LP2,  while  LP2  is  waiting  on  LPI.  Hence,  deadlock  has  occurred  (25,  7). 
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System  That  Does 


The  two  methods  for  handling  deadlocks  are  avoidance  and  detection.  Initially,  Chandy  and 
Misra  proposed  the  use  of  null  messages  as  a  means  of  deadlock  avoidance  (6).  A  null  message 
contains  just  an  updated  time,  and  no  other  state  information  ( t,NULL ).  The  null  message 
guarantees  that  no  messages  are  sent  with  a  time  less  than  t.  In  the  case  of  Figure  2,  every  time 
LP1  sends  a  message  to  LP2,  it  sends  a  null  message  with  the  same  time-stamp  to  LP3.  This  allows 
LP3,  LP4,  and  LP5  to  progress.  For  the  cyclic  waiting  problem  of  Figure  3,  after  LP2  receives 
a  message  from  LP1  at  t  =  20,  it  sends  (20  +  tLP2de\ay,  NULL),  where  tLP2deiay  corresponds  to 
the  propagation  delay  of  LP2.  If  tLP2dtlay  —  5,  then  LP2  sends  (25,  NULL),  and  LP3  responds 
by  sending  a  null  message  to  LP1  with  LP3’s  propagation  delay  added  to  the  time-stamp,  e.g., 
(25  +  tLP3deiay ,  NU LL).  In  this  way,  the  simulation  advances. 

The  null  message  approach  is  costly  because  a  large  fraction  of  messages,  and  therefore 
communication  overhead,  turns  out  to  be  null  messages  (7).  Another  technique  proposed  by  Chandy 
and  Misra  is  to  allow  the  simulation — or  a  subset  of  the  simulation — to  deadlock  and  then  use  a 
master  controller  to  detect  and  recover  from  deadlock  (7:202).  Detection  can  be  accomplished 
using  the  termination  detection  algorithm  of  Dijkstra  and  Scholten,  or  from  a  method  proposed  by 
Chandy  and  Misra  (8:148)  (7:202).  Then,  the  controller  polls  all  LPs  that  are  deadlocked  for  their 
earliest  next  event  time.  The  minimum  of  these  times  is  the  safe  time  for  all  deadlocked  LPs  to 
advance,  since  no  events  can  occur  before  this  safe  time.  Therefore,  the  controller  broadcasts  the 
safe  time  to  all  affected  LPs,  which  in  turn  update  their  LVTs,  and  the  simulation  continues. 

The  use  of  a  central  controller  affects  simulation  performance  since  it  must  periodically  in¬ 
tervene  and  evaluate  the  simulation  to  see  if  deadlock  has  occurred.  However,  Chandy  and  Misra 
maintain  the  interference  is  not  expected  to  be  a  bottleneck  since  active  interference  occurs  only 
at  deadlock  (7:202). 

2. 3. 3. 2  Optimistic  Distributed  Simulation  Protocols.  The  Time  Warp  mechanism  for 
distributed  simulation  is  an  optimistic  protocol.  LPs  are  allowed  to  go  forward  in  time,  risking  the 
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chance  that  another  process  may  send  a  message  that  affects  the  LP’s  history.  The  LP  then  “rolls 
back”  to  the  appropriate  time  in  order  to  handle  the  new  message.  This  requires  each  LP  to  have 
state-saving  capabilities,  and  to  have  the  ability  to  “unsend”  messages  that  now  are  invalid.  Also, 
because  of  its  rollback  capability,  the  Time  Warp  mechanism  can  handle  dynamic  process  allocation 
and  connectivity  (9).  In  his  survey  of  parallel  discrete-event  simulation  paradigms,  Fujimoto  gives 
the  following  description  of  Time  Warp: 


The  Time  Warp  mechanism,  based  on  the  Virtual  Time  paradigm,  is  the  most  well 
known  optimistic  protocol.  Here,  virtual  time  is  synonymous  with  simulated  time.  In 
Time  Warp,  a  causality  error  is  detected  whenever  an  event  message  is  received  that 
contains  a  timestamp  smaller  than  that  of  the  process’s  clock  (i.e. ,  the  timestamp  of  the 
last  processed  message).  The  event  causing  rollback  is  called  a  straggler.  Recovery  is 
accomplished  by  undoing  the  effects  of  all  events  that  have  been  processed  prematurely 
by  the  process  receiving  the  straggler,  i.e.,  those  processed  events  that  have  timestamps 
larger  than  that  of  the  straggler. 

An  event  may  do  two  things  that  have  to  be  rolled  back:  it  may  modify  the  state  of 
the  logical  process,  and/or  it  may  send  event  messages  to  other  processes.  Rolling  back 
the  state  is  accomplished  by  periodically  saving  the  process’s  state,  and  restoring  an 
old  state  vector  on  rollback.  “Unsending”  a  previously  sent  message  is  accomplished  by 
sending  a  negative  or  anti-message  that  annihilates  the  positive  messages.  If  a  process 
receives  an  anti-message  that  corresponds  to  a  positive  message  that  it  has  already 
processed,  then  that  process  must  also  be  rolled  back  to  undo  the  effect  of  processing 
effects  of  the  erroneous  computation  to  eventually  be  canceled.  It  can  be  shown  that 
this  mechanism  always  makes  progress  under  some  mild  constraints. 

As  noted  earlier,  the  smallest  timestamped,  unprocessed  event  in  the  simulation  is 
always  safe  to  process.  In  Time  Warp,  the  smallest  timestamp  among  all  unprocessed 
event  messages  (both  positive  and  negative)  is  called  global  virtual  time  (GVT).  No 
event  with  timestamp  smaller  than  GVT  will  ever  be  rolled  back,  so  storage  used  by 
such  events  (e.g.,  saved  states)  can  be  discarded.  Also,  irrevocable  operations  (such 
as  I/O)  cannot  be  committed  until  GVT  sweeps  past  the  simulated  time  at  which  the 
operation  occurred.  The  process  of  reclaiming  memory  and  committing  irrevocable 
operations  is  referred  to  as  fossil  collection.  (15) 


2.4  Overview  of  SPECTRUM. 

In  1988,  Reynolds  recognized  the  existence  of  a  spectrum  of  options  for  parallel  simulation 
protocol  designs  (30).  In  order  to  study  classes  of  protocols  for  classes  of  applications,  the  Uni¬ 
versity  of  Virginia  developed  SPECTRUM  (Simulation  Protocol  Evaluation  on  a  Current  Testbed 
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Figure  4.  Block  Diagram  of  the  SPECTRUM  Testbed  (31) 

using  Reusable  Modules)  (32).  SPECTRUM  is  a  common  testbed  used  for  creating  parallel  simu¬ 
lations  by  taking  an  application  and  breaking  it  into  application  components,  i.e.,  “pieces”  of  the 
application  that  run  concurrently.  Each  application  component,  along  with  a  process  manager  and 
node  manager,  make  a  logical  process  (LP),  as  shown  in  Figure  4.  The  process  manager  provides 
LP-level  functions  to  the  application  for  initialization,  local  clock  management,  and  event  han¬ 
dling.  The  node  manager  provides  hardware-specific  functions  to  the  process  manager  for  event 
traffic  among  the  LPs.  To  implement  specific  protocols,  filters  are  written  that  “intercept”  an  LP- 
level  function  call  by  the  application.  The  filters  may  then  invoke  protocol-specific  actions,  such  as 
null  message  generation,  LP  polling  for  a  message,  etc. 

AFIT  has  continued  to  maintain  the  SPECTRUM  testbed  as  a  baseline  for  queueing  sim¬ 
ulations  (33),  battle  simulations  (3),  and  VHDL  simulations.  Also,  research  is  being  conducted 
on  a  hardware  coprocessor  that  would  emulate  the  basic  SPECTRUM  functions  with  microcode 
capability  to  modify  simulation  protocols  (11).  For  details  on  the  SPECTRUM  environment  at 
AFIT,  refer  to  Hartrum  (16). 
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2.5  Other  Parallel  VHDL  Research. 


In  1989,  Proicou  (28)  developed  a  distributed  system  consisting  of  a  scalable  kernel  that 
supports  VHDL  simulations  on  the  Intel  iPSC/2  Hypercube.  The  distributed  simulation  kernel 
was  an  extension  of  the  AFIT  VHDL  tool  set,  described  in  (12).  The  simulation  ran  over  the 
SPECTRUM  testbed.  Proicou  found  that  a  general  purpose  simulation  kernel  may  not  be  able  to 
take  advantage  of  the  presence  or  absence  of  feedback  loops  in  the  simulation  (28:7-1).  In  general, 
it  was  determined  unlikely  that  one  distributed  kernel  design  is  efficient  enough  to  provide  good 
performance  for  the  wide  range  of  VHDL  models.  For  example,  primarily  behavioral  descriptions 
may  contain  a  small  number  of  large  processes,  while  primarily  structural  descriptions  may  contain 
a  large  number  of  very  simple  processes  (18). 

In  1990,  Ball  and  Hoyt  (1)  reported  work  in  progress  to  implement  a  parallel  VHDL  simulator 
using  “Adaptive  Time  Warp,”  in  which  they  look  for  better  performance  then  Chandy-Misra  or 
Time  Warp.  Adaptive  Time  Warp  is  similar  to  Time  Warp;  however,  it  attempts  to  reduce  “time- 
faults,”  i.e.,  messages  that  cause  roll-back.  If  a  process  has  recently  experienced  a  high  number  of 
“time-faults,”  it  suspends  execution  for  a  short  time,  known  as  the  “blocking  window,”  which  is 
proportional  to  the  message  bandwidth.  Work  is  in  progress  to  develop  a  testbed  which  implements 
this  strategy. 

Comeau’s  1991  thesis  investigated  how  to  modify  a  commercial  VHDL  compiler  and  simulator — 
Intermetrics  VHDL — for  parallel  simulation  on  the  Intel  iPSC/2  Hypercube  (10).  In  so  doing,  he 
looked  for  parallelism  in  the  intermediate  C  code  generated  in  the  “model  generate”  phase  of  compi¬ 
lation.  Then,  he  modified  the  C  code  for  compatibility  with  the  iPSC/2  and  his  parallel  simulator, 
PVSIM.  PVSIM  is  the  product  of  a  portion  of  Intermetrics’  C  source  code  simulator  routines,  along 
with  routines  added  by  Comeau.  He  tested  the  simulator  on  three  8-bit  adder  circuits:  a  carry-save 
adder,  a  carry-lookahead  adder,  and  a  ripple-carry  adder.  In  general,  simulations  using  four  to 
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eight  processors  exhibited  a  speedup  at  least  twice  that  of  simulations  on  one  node.  His  results  led 
him  to  the  following  conclusions: 

•  Minimize  and  balance  the  number  of  active  signals  in  a  logical  process. 

•  Carefully  modify  the  Intermetrics’  generated  C  source  code. 

•  Ensure  a  high  computation  to  communication  ratio.  (10:6-12) 

This  thesis  builds  on  the  lessons  learned  in  Comeau’s  research.  For  example,  modifying  Intermetrics’ 
C  code  is  now  automated. 

In  1992,  Zhang  (34)  investigated  possible  methods  to  partition  a  VHDL  design  for  hierarchical 
distributed  simulation.  He  evaluated  VHDL  entities,  blocks,  and  processes  for  modularity  and  con¬ 
currency.  Zhang  reported  the  following  observations  with  respect  to  determining  the  optimal  struc¬ 
ture  (entity,  block,  or  process)  for  use  as  an  “atomic  model”  in  parallel  VHDL  simulation  (34:203): 

•  The  entity  descriptions  define  a  clear  interface  between  components;  however,  using  the  entity 
does  not  fully  utilize  the  inherent  parallelism  among  the  blocks  and  processes. 

•  The  block  would  exploit  more  parallelism  than  the  entity,  but  not  as  much  as  the  process. 
Also,  blocks  can  be  nested,  which  causes  concurrency  problems. 

•  VHDL  forces  concurrency  at  the  process  level,  so  for  the  greatest  amount  of  parallelism, 
Zhang  concludes  that  the  process  is  the  best  atomic  model  for  parallel  simulation.  However, 
processes  do  not  have  a  clearly  defined  interface — as  is  the  case  with  entities  and  blocks. 

Zhang  introduces  the  refined  process ,  which  is  generated  by  defining  a  connection  port  for  every 
signal  or  port  in  a  process  and  removing  wait  statements.  In  this  manner,  the  process  interface  is 
clearly  defined.  While  this  method  exploits  the  maximum  amount  of  parallelism  and  provides  a 
way  to  theoretically  study  the  behavior  of  a  VHDL  design,  Zhang  concludes  that  it  is  not  robust 
enough  for  practical  use  (34:204). 
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2.6  Summary. 


Discrete-event  simulation  mechanisms  sire  commonly  used  in  sequential  simulations.  By  in¬ 
troducing  the  concept  of  logical  processes,  local  virtual  time,  and  message-passing,  asynchronous 
simulation  protocols  can  extend  simulation  principles  to  exploit  parallel  and  distributed  computers. 
The  conservative  Chandy-Misra  protocol  guarantees  an  LP  does  not  receive  messages  out  of  order 
with  respect  to  time,  but  a  mechanism  must  be  provided  to  avoid  or  detect  deadlock.  The  optimistic 
method  of  Time  Warp  allows  LPs  to  proceed  at  their  own  pace  based  on  present  information.  If 
a  message  comes  in  with  a  time  stamp  in  the  past,  then  an  LP  must  roll  back  to  that  simulation 
time  in  order  to  handle  the  message. 

As  interest  in  increasing  the  performance  of  VHDL  grows,  a  number  of  research  efforts  have 
been  conducted  to  investigate  ways  to  map  VHDL  simulations  to  parallel  processors.  This  thesis 
continues  the  work  initiated  by  Comeau — mapping  Intermetrics’  VHDL  capabilities  to  the  Intel 
iPSC/2,  and  now  also  to  the  iPSC/860. 
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III.  Methodology 


3.1  Introduction. 

When  a  VHDL  circuit  is  compiled  with  the  Intermetrics  VHDL  toolset,  the  intermediate 
C  code  can  be  intercepted  and  transformed  to  be  linked  with  AFIT’s  parallel  VHDL  simulator 
(VSIM).  VSIM  can  run  sequentially  on  a  single  processor,  or  in  parallel  on  the  Intel  iPSC/2  or 
iPSC/860  Hypercubes.  For  parallel  simulations,  VSIM  runs  over  SPECTRUM.  The  subset  of 
VHDL  circuits  that  can  be  simulated  with  VSIM  includes  structural  descriptions  of  logic  gates 
and  simple  processes.  The  “behavioral  instances”  which  represent  these  processes  are  grouped  into 
logical  processes  (LPs)  and  the  LPs  are  distributed  among  the  nodes  of  any  cubesize. 

Comeau  identified  the  data  structures  and  the  basic  cycle  required  for  simulation  (10:3-9). 
This  chapter  reviews  the  data  structures,  simulation  cycle,  and  requirements  for  parallel  simulation. 

3.2  Overview. 

In  order  to  run  a  sequential  VHDL  simulation  with  the  Intermetrics’  VHDL  toolset,  the  circuit 
designer  must  compile  the  VHDL  source  code,  and  then  build  and  simulate  the  circuit  model.  This 
process  is  shown  in  Figure  5. 

Circuits  are  first  compiled  using  the  vhdl  command.  This  generates  an  IVAN  file  (which 
stands  for  Intermediate  VHDL  Attributed  Notation).  The  IVAN  file  contains  the  intermediate 
C  code  descriptions  of  the  circuit  components — which  the  simulator  uses.  By  using  Intermetrics’ 
compiler,  the  syntax  and  semantics  of  VHDL  circuit  descriptions  have  already  been  checked,  and 
correct  C  code  is  automatically  generated.  Normally,  generation  of  the  IVAN  file  is  transparent  to 
the  VHDL  circuit  designer. 

During  the  model  generate  phase,  the  specific  C  code  descriptions — and  their  header  files — are 
extracted  from  the  IVAN  file  and  object  files  are  created.  These  files  are  also  normally  transparent 
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Figure  5.  Simulation  Session  Using  Intermetrics’  Toolset  (10:3-8) 


to  the  designer;  however,  for  parallel  simulations,  they  are  transformed  into  files  that  are  compatible 
with  VSIM. 

In  the  build  phase,  a  compilation  script  is  generated  that  compiles  and  links  the  C  modules 
with  Intermetrics’  simulator  modules  for  operation.  Now,  the  circuit  can  be  simulated  with  the 
aim  command,  and  a  report  can  be  generated  with  the  rpt  command  using  Intermetrics’  report 
control  language. 

For  parallel  operation,  the  intermediate  C  code  is  transformed  into  C  code  that  can  be  linked 
with  VSIM  and  run  on  the  hypercube,  as  shown  in  Figure  6.  For  this  to  happen,  the  code  is 
transformed  using  a  postprocessor  called  pbuild,  which  reads  the  compilation  script  file  and  uses 
plax  to  extract  and  transform  the  intermediate  code.  The  new  code  is  linked  with  VSIM,  which, 
together  with  SPECTRUM,  runs  the  simulation  on  the  hypercube. 


Figure  6.  Parallel  Simulation  Session 

3.3  Data  Structures. 

The  data  structures  for  sequential  operation  are  based  upon  Intermetrics’  VHDL  simulator. 
The  four  main  data  structures  are  as  follows: 

•  Behavior  Instances.  The  behavior  instances  are  used  to  describe  the  behavior  of  each 
component  (AND  gate,  OR  gate,  etc.)  and  other  types  of  processes.  Behavior  instances 
contain  a  unique  id  number  and  a  pointer  to  their  execution  routine  in  memory.  Several 
behavior  instances  may  share  the  same  execution  routine,  e.g.,  all  AND  gates  in  a  circuit  may 
use  the  same  algorithm  to  execute.1  The  basic  structure  for  behavior  instances  is  shown  in 
Figure  7. 

•  Signal  Records.  The  signal  records  maintain  the  current  state  of  each  signal,  including  a 
unique  identifier,  signal  name,  current  value,  size,  and  pointers  to  behavioral  instances  (to 

1This  would  be  true  if,  for  example,  all  AND  gates  used  the  same  entity /architecture  pair. 
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typedef  struct  BHIHSTS  { 

/*  behavior  instance  */ 

BHKIHD  prty; 

/*  kind  (user,  system,  etc.)  */ 

IHT32  id; 

/*  id  */ 

ERRT  (*exec)(); 

/*  behavior  function  */ 

>  BHIHST ; 

Figure  7. 

Basic  Structure  for  Behavior  Instances 

typedef  struct  { 

/*  signal  record  */ 

UIIT32  id; 

/*  id  */ 

char  *name; 

/*  name  */ 

unsigned  size:  4; 

/*  size  of  data  value  (bytes)  */ 

UIIT32  cval; 

/*  current  value  (offset)  */ 

COIIT  * conns ; 

/*  behavioral  connections  */ 

>  SRREC; 

Figure  8.  Basic  Structure  for  Signal  Records 

identify  each  signal’s  connections).  The  current  value  field  is  an  offset  from  a  global  address 
space  whose  base  is  denoted  by  the  global  variable  cv.  See  Figure  8  for  an  example  of  the 
signal  record  structure. 

•  Behavior  List.  This  list  contains  all  behaviors  scheduled  to  execute  for  the  current  sim¬ 
ulation  time.  At  the  beginning  of  the  simulation  (f  =  0),  all  behaviors  are  scheduled  for 
execution  to  initialize  their  input  and  output  values.  As  behaviors  are  executed,  they  are 
removed  from  the  list.  After  the  simulation  clock  advances  past  zero,  signal  changes  cause 
affected  behaviors  to  be  re-scheduled  and  re-executed.  The  behavior  list  is  a  simple  linked-list 
called  tmpbeh,  see  Figure  9.2 

2 The  variable  name  tapbsh  is  used  to  maintain  consistency  with  Intermetrics’  naming  conventions. 
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typedef  struct  TMPKS  { 
BHIIST  *beh; 
struct  TMPKS  *nextb; 

>  TMPK; 

/*  behavior  list  */ 

/*  behavior  instance  pointer  */ 

/*  next  behavior  */ 

Figure  9.  Behavior  List  Structure 

typedef  struct  SIG_RECS  { 

/*  active  record  structure  */ 

int  time; 

/*  signal  change  time  */ 

SRP  sr_ptr; 

/*  signal  record  */ 

int  value; 

/*  possible  nes  value  */ 

struct  SIG_RECS  *next_sig_rec; 

>  S1G_REC ; 

Figure  10.  Active  Record  Structure 


•  Active  Records.  This  is  the  simulator’s  next-event  list,  called  actv.  An  “event”  corresponds 
to  a  behavior  output  value  that  may  be  a  signal  change.  Each  entry  contains  an  event  time,  a 
pointer  to  the  correct  signal  in  the  signal  record  list,  and  a  possible  new  value  for  that  signal 
(depending  on  delay  type,  etc.),  as  shown  in  Figure  10. 


An  example  of  the  interrelationship  of  the  VHDL  data  structures  is  shown  in  Figure  11.  Here, 
signal  number  2,  called  CII,  is  changing  from  a  ‘0’  to  a  T’  at  time  50.  The  active  record  entry  has 
the  new  value,  and  a  pointer  to  the  specific  signal  record.  The  signal  record  has  a  pointer  to  the 
global  memory  space  in  cval,  and  the  list  of  affected  behaviors,  i.e.,  the  AND  gate  and  XOR  gate. 
Therefore,  these  behaviors  are  added  to  the  behavior  list  for  execution  at  time  50. 


3-4  Sequential  Simulation  Cycle. 


The  sequential  simulation  cycle  for  VSIM  is  shown  in  Fig  12.  The  following  “routines”  run 
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Active  Records 


Signal  Records 

id  size  name 


conns 


beh  4 
nextb  Nil 


id  4 

exec  OR 

inputO  3 

inputl  5 

outputO  7 


Main  Memory 


Figure  11.  Interrelationship  of  VHDL  Simulation  Data  Structures  (10:3-14) 
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Active 

Records  (actv) 


Figure  12.  The  VHDL  Simulation  Cycle  (10:3-15) 


the  simulation: 


•  post.  Posts  each  event  to  the  active  record  list  whenever  a  behavior  has  executed. 

•  get.low.time.  Returns  the  lowest  next-event  time  from  the  active  records  list.  The  simula¬ 
tion  clock  is  updated  to  this  “low  time.”  Records  with  this  time  are  removed  from  the  active 
record  list  and  sent  to  the  compare. values  routine. 

•  compare. values.  Compares  the  new  data  value  of  each  event  (new  to  the  old  data  value  in 
memory  that  is  associated  with  that  event’s  behavior  instance,  i.e.,  circuit  component.  Jf  the 
value  is  the  same,  the  event  is  simply  ignored  (the  message  is  consumed);  otherwise,  affected 
behaviors  are  scheduled  on  the  behavior  list  for  operation. 

•  execute.bshavior.  Removes  behaviors  from  the  behavior  list  and  executes  them.3 

3  Actual  execution  of  each  behavior  instance  occurs  in  the  intermediate  C  code.  These  behavior  functions  rail  the 
post  function  directly. 
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sim_it  () 

SIG.REC  * signal; 

/*  while  active  record  list  and  behavior  list  are  not  empty  */ 
while  (actv  !=  TOLL  II  tmpbeh  !=  TOLL)  { 
while  (tmpbeh  !  =  TOLL)  { 

execute.behaviorO ;  /*  execute  behavior  and  post  */ 

remove.behaviorQ ; 

> 

update_sim_time(get_low_time()) ;  /*  process  low  time  */ 

while  (signal  =  active_exists(*sim_time))  { 

if  (unchanged (signal))  {  /*  compare  values  */ 

remove_signal(signal) ; 

> 

else  { 

update_signal (signal) ; 

8chedule_behaviors(signal) ; 
remove_signal(signal) ; 

> 

> 

> 

> 


Figure  13.  Main  Simulation  Loop  in  VSIM 

At  the  beginning  of  the  simulation,  input  signals  are  present  in  the  active  record  list,  and 
all  behaviors  are  scheduled  for  execution  at  t  =  0.  The  simulation  starts  at  execute.behavior. 
The  main  (sequential)  simulation  loop  in  VSIM  is  shown  in  Figure  13.  This  Figure  shows  that  the 
simulation  cycles  from  executing  behaviors  to  extracting  signal  changes  until  the  active  list  and 
behavior  list  are  empty.  Specifically,  while  either  list  is  not  empty,  perform  the  following: 


1.  Execute  all  behaviors  on  the  behavior  list,  posting  the  resulting  signals  after  each  execution. 

2.  Update  the  simulation  clock  to  the  next  lowest  time  on  the  active  list. 

3.  Extract  every  active  record  with  a  time-tag  equal  to  the  simulation  clock. 
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4.  If  the  active  records  indicate  a  signal  change  (when  compared  to  their  current  value  in  mem¬ 
ory),  then  update  the  signal’s  value  in  memory  and  schedule  affected  behaviors. 

5.  Go  back  to  step  1. 

3.5  Active  List  Management. 

A  behavior  executes  at  time  t  when  one  of  its  input  signals  changes  at  time  t .  When  a  behavior 
executes,  the  resulting  output  signal  is  posted  to  the  active  list  in  time  order  at  t  +  toELAY<  where 
t delay  is  the  delay  of  the  behavior.  If  n  input  signals  change  at  t,  the  behavior  executes  n  times 
and  calls  the  post  routine  n  times  to  post  the  resulting  signal  output  at  t  + 1  del  ay-  Since  the 
behavior  executes  on  each  input  signal  change,  the  correct  output  posted  to  the  active  list  always 
corresponds  to  the  last  signal  change  for  a  given  time,  t.  Therefore,  for  correct  operation,  if  an 
event  to  be  posted  matches  an  event  behavior  id  and  time  stamp  in  the  active  list,  the  old  event  is 
replaced  by  the  new  event. 

In  VHDL,  a  component  may  be  defined  to  have  an  inertial  or  transport  delay-type.  An  inertial 
delay  corresponds  to  components  which  require  input  signals  to  persist  for  a  given  time  before  the 
output  signal  changes.  A  transport  delay  is  similar  to  a  “wire  delay,”  the  output  gets  the  function 
of  the  inputs  after  delay.  The  default  delay-type  for  logic  gates  is  inertial. 

3.5.1  Transport  Delays.  Figure  14  shows  an  AND  gate  with  a  transport  delay.  The  output 
function,  Out.l  =  In.l  AND  In.2  after  gate  delay ,  occurs  regardless  of  the  time  duration  of  the 
input  signals  or  any  combination  of  input  signals.  Therefore,  no  special  action  is  required  when 
posting  the  output  to  the  active  record  list. 

3.5.2  Inertial  Delays.  The  rule  for  inertial  delays  is  that  the  output  does  not  change  within 
the  inherent  delay  of  the  logic  gate.  For  active  list  management,  if  a  behavior  executes  at  time  t 
and  its  corresponding  output  is  to  be  posted  at  Inbw _ event  =  t  +  toELAY ,  and  a  signal  change 
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In_l 
In  2 


Out  1 


Delay  =  3ns 


Figure  14.  An  AND  Gate  with  a  Transport  Delay. 

can  be  found  for  the  same  behavior  with  a  time,  Ievent,  where  t  <  Ievent  <  Inew _ event , 
then  the  output  at  Ievent  is  removed  if  the  signal  value  at  tNEW _ event  the  opposite  of  the 
value  at  tEVENT- 

Figure  15  shows  an  AND  gate  with  an  inertial  delay  of  3ns.  At  3ns,  In_2  goes  to  a  logic  T’ 
and  the  gate  is  executed.  As  a  result,  an  output  of  T’  is  scheduled  in  the  active  list  with  a  time 
tEVENT  —  6ns.  At  5ns,  InJ2  goes  back  to  'O’,  the  gate  is  executed,  and  an  output  of  ‘0’  is  generated 
at  tp/EW  jev ent  —  8ns.  When  the  new  event  is  posted,  the  change  at  tEVENT  >s  identified  and 
removed  from  the  active  list  because  (t  =  5)  <  ( tEVENT  =  6)  <  {Inew _ event  =  8)  and  the  value 
at  Inew  .event  (‘0’)  is  the  opposite  of  the  value  at  Ievent  (‘1’). 
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Figure  15.  An  AND  Gate  with  an  Inertial  Delay. 

3.6  Transformation  of  Intermediate  C  Code. 

The  intermediate  C  code  contains  the  circuit-specific  information.  During  the  simulation,  it 
is  this  code  which  instantiates  the  signals  and  behaviors,  and  their  interrelationships.  Also,  this 
code  contains  the  functions  that  describe  the  behavior  of  every  behavior  instance.4 

VSIM  does  not  support  every  capability  of  VHDL.  For  example,  processes  with  wait  state¬ 
ments  are  not  supported.  Also,  complex  behavioral  processes  Eire  not  supported,  e.g.,  processes  that 
manipulate  integers  (instead  of  bits)  as  signals.  As  this  project  grows,  more  of  the  intermediate  C 
code  can  be  included  and  compiled  with  VSIM.  To  make  the  intermediate  C  code  compatible  with 
the  current  version  of  VSIM,  the  following  general  steps  must  be  taken:5 

•  Identify  and  extract  the  files  that  were  generated  during  the  model  generate  phase. 

•  Modify  the  finclude  directives  accordingly. 

4 Several  behavior  instances  may  share  the  same  function. 

5  The  specific  steps — and  their  implementation  in  p  bn  lid  and  plsz — are  discussed  in  Chapter  4. 
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•  Remove  calls  to  trace  routines  and  other  trace  statements.  VSIM  does  not  support  tracing 
capabilities. 

•  Modify  the  mksigO  function  call  to  include  a  field  for  the  signal  name.  This  is  so  VSIM 
output  can  refer  to  signals  by  name  instead  of  identifier. 

•  Modify  the  behavior  functions  to  report  the  name  of  the  entity /architecture  pair  it  represents 
(if  MAPPIHG  is  turned  on  in  VSIM). 

•  Change  main()  to  vhdl_main()  so  VSIM  can  call  it  after  initialization. 

•  Modify  the  intermediate  code  to  call  VSIM’s  init_cv()  and  sim_it()  routines  for  circuit 
initialization  and  to  start  the  simulation,  respectively. 

3.7  Parallel  VHDL  Simulation. 

3.7.1  SPECTRUM  and  VSIM.  As  shown  in  Figure  16,  VSIM  is  run  over  SPECTRUM  in 
order  to  “parallelize”  the  simulation  and  evaluate  the  effectiveness  of  various  protocols  on  paral¬ 
lel  VHDL  simulations  while  requiring  minimal  modifications  to  the  original  application — VSIM. 
Spectrum  allows  the  application  to  be  broken  into  LPs,  and  the  LPs  communicate  with  each  other 
with  function  calls  to  the  “LP  manager” — lpjnan.c.  These  function  calls  can  be  interrupted  by 
“filters,”  which  may  provide  additional  handshaking,  clock,  or  queue  management,  as  required  for 
various  protocols.  The  main  functions  are 

•  lp_init().  Ensures  LPs  are  fully  initialized.  Builds  filter  tables,  if  any. 

•  lp_get_event().  Get  the  next  event  from  the  SPECTRUM  queue. 

•  lp_post_event  (  ) .  Send  event  to  specified  LP. 

•  lp_advance_time() .  Advance  an  LP’s  local  time.6 

’Recently,  a  terminate  filter  was  added  to  SPECTRUM.  VSIM  was  not  modified  to  take  advantage  of  this  new 
filter. 
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VSIM 

lp_man.c 

cube2.c 

hypercube 


Figure  16.  VSIM  on  the  SPECTRUM  Testbed  (One  LP  Shown) 

The  hardware  interface  to  the  Hypercubes  is  provided  in  the  functions  in  cube2.c.  In  general, 
lp_man.c  makes  these  calls,  and  the  application  (VSIM)  makes  only  LP-level  calls.  LPs  can  be 
partitioned  among  processors  in  a  number  of  ways.  Because  of  the  multitasking  capabilities  of  the 
Intel  80386,  a  “logical  process”  does  not  have  to  correspond  to  a  “physical  processor.”  Therefore,  a 
simulation  with  eight  LPs  can  be  partitioned  among  one  to  eight  processors  of  the  iPSC/2.7  On  the 
iPSC/860  Hypercube,  however,  there  must  be  a  one-to-one  mapping  of  LPs  to  processors,  because 
each  i860  processor  does  not  support  multitasking.8 

9.7.2  The  SPECTRUM/VSIM  Filters.  The  SPECTRUM  filters  for  VSIM  are  based  on  a 
previously  existing  filter  called  chanclocks.  These  filters  provide  the  null-message  protocol. 

In  general,  messages  among  LPs  are  signal  changes  with  the  structure  of  Figure  17.  Once  an 
event  is  received,  VSIM  converts  it  into  an  active  record  and  posts  it  in  the  active  list. 

7AFIT’s  iPSC/2  Hypercube  has  eight  Intel  80386  processors. 

•The  iPSC/860  Hypercube  at  Wright-Patterson  AFB  has  eight  Intel  i860  processors. 
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typedef  struct  svent  { 

int  from_lp;  /*  lp  id  o f  lp  sending  event  */ 

int  to_lp;  /*  lp  id  of  destination  lp  */ 

int  time;  /*  timestamp  of  event  */ 

int  event;  /*  event  type  or  number  */ 

int  id;  /*  signal  id  */ 

int  value;  /*  signal  value  */ 

struct  event  *next; 

>; 


Figure  17.  Event  Structure  for  Message  Passing 

For  this  discussion,  tsuLL  is  the  null  message  time,  tquE  is  the  lowest  time  stamp  of  an 
LP’s  SPECTRUM  input  queue,  tNEQ  is  the  “low  time”  in  the  local  LP’s  active  list  (in  VSIM),  and 
iDELAY  is  the  output  delay  of  an  LP.9 

The  safe  time,  tsAFE >  is  the  local  virtual  time  (LVT)  an  LP  can  safely  approach.  It  is  the 
minimum  input  time  of  all  input  arcs.  In  other  words,  an  LP  knows  it  does  not  receive  a  message 
prior  to  this  time,  so  it  is  safe  to  advance  it’s  LVT  to  tsAFE •  Incoming  NULL  messages  are  used 
to  update  this  safe  time,  and  serve  no  other  purpose. 

Incoming  events  in  SPECTRUM’S  queue  are  stored  in  time  order.  Therefore,  if  an  event  at 
the  head  of  this  queue  has  a  time  stamp  less  than  or  equal  to  tsAFE,  the  event  may  be  passed  to 
VSIM  upon  request.  This  is  called  a  “valid  event,”  because  by  the  Chandy-Misra  paradigm,  it  is 
guaranteed  that  no  messages  are  received  prior  to  tsAFE- 

3.7.2. 1  Rules  for  Null  Messages.  Null  messages  Me  used  to  avoid  deadlock,  as  dis¬ 
cussed  in  Chapter  2.  They  are  sent  from  an  LP  in  three  cases: 

1.  Upon  initialization,  every  LP  sends  a  null  message  at  time  Inull  =  (0  +  toELAY)- 

8  Strictly  (peaking,  there  i(  a  unique  output  delay  for  every  output  arc  of  an  LP,  but  for  this  thesis,  it  is  assumed 
all  output  delays  on  each  arc  are  the  same. 
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2.  When  a  signal  is  sent  to  another  LP  via  an  output  arc  at  time  t,  all  other  output  arcs  are 
sent  a  null  message  at  time  t. 

3.  When  VS1M  requests  a  signal  and  SPECTRUM  has  a  valid  event,  it  is  returned  to  VSIM.  If 
there  is  no  event  ready,  the  receive  filter  checks  to  see  if  Ineq  <  l  safe ■  If  so,  a  NULL  pointer 
is  returned  and  VSIM  continues.  Otherwise,  the  filter  waits — or  blocks — for  an  incoming 
event.  When  an  LP  is  about  to  block,  it  sends  a  null  message  at  Inull  =  ™in({tsAFE  + 
Idelay)^neq)  to  all  downstream  LPs.  Therefore,  deadlock  is  avoided  because  every  LP 
sends  a  “guarantee”  that  no  messages  are  sent  prior  to  Inull ,  and  every  downstream  LP  can 
update  their  safe  times;  therefore,  cyclic  waiting  does  not  occur. 

3.7.3  Modifications  to  VSIM  for  Parallel  Simulation.  The  VHDL  simulation  can  be  parti¬ 
tioned  in  a  number  of  ways.  One  method  would  be  to  allow  each  LP  to  share  the  behavior  instances, 
but  partition  the  signals  among  the  LPs.  When  a  behavior  executes,  the  LP  determines  the  owner 
of  the  resulting  signal,  and  an  event  is  sent  to  the  corresponding  LP.  Another  method— and  the 
one  implemented  in  this  research — is  to  allow  the  LPs  to  share  signals,  but  partition  the  behaviors. 
This  way,  only  valid  signal  changes  are  sent  to  other  LPs.  When  a  signal  does  change,  this  event  is 
sent  to  all  LPs  with  affected  behaviors.  The  behavior  list  of  any  LP  would  consist  of  only  behav¬ 
iors  “owned”  by  that  LP.  Messages  are  introduced  into  the  simulation  cycle  as  shown  in  Figure  18. 
This  cycle  is  the  based  on  the  sequential  simulation  cycle  of  Figure  12;  however,  signal  changes  that 
affect  other  LPs  are  now  sent  to  those  LPs  as  events.  Similarly,  after  local  behaviors  are  executed 
and  posted,  if  any  upstream  events  are  forthcoming,  they  are  posted  in  the  active  record  list.  Each 
LP  runs  the  same  simulation,  but  with  different  data  in  terms  of  behaviors.  This  is  known  as  a 
single  program/multiple  data  (SPMD)  configuration  (21). 

A  parallel  simulation  in  a  2-LP  configuration  is  shown  in  Figure  19.  This  Figure  shows  the 
connectivity  if  each  LP  had  signal  changes  that  affected  behaviors  on  the  other  LP.  Another  possible 
configuration  for  2-LPs  could  be  that  only  one  LP  depended  on  the  other,  “upstream”  LP. 
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Get  Event 


Active 


Figure  18.  Parallel  VHDL  Simulation  Cycle  Shown  for  One  LP 


Figure  19.  A  2-LP  configuration 
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Incoming  Events 


Figure  20.  Data  Flow  for  Incoming  Event 

The  process  of  receiving  an  event  is  shown  in  Figure  20.  Incoming  events  are  stored  by 
SPECTRUM  in  an  input  queue  until  requested  by  VSIM.  When  SPECTRUM  receives  events,  the 
input  safe  time  is  updated.  When  VSIM  requests  am  event,  the  receive  filter  removes  it  from  the 
SPECTRUM  queue  (according  the  the  rules  for  null  messages  in  the  previous  section)  and  passes 
it  to  VSIM.  In  turn,  VSIM  posts  it  in  it’s  local  active  list  and  continues  the  simulation. 

The  main  simulation  loop  of  VSIM  must  be  modified  to  accommodate  parallel  operation.  In 
sequential  operation,  the  simulation  is  complete  when  the  active  list  and  behavior  list  are  both 
empty.  This  may  not  be  the  case  for  parallel  operation.  One  LP  may  have  empty  active  and 
behavior  lists,  but  an  upstream  LP  may  send  another  active  record  (signal  change)  to  be  put  in  the 
empty  active  list.  Therefore,  each  LP  must  run  until  the  maximum  simulation  time  is  reached,  as 
shown  in  Figure  21..  In  support  of  this  change,  the  get_low_ti*e()  function  is  modified  to  return 
the  maximum  time  if  the  active  list  is  empty.  This  method  is  correct  for  parallel  and  sequential 
operation. 
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sim_it  () 

{ 

SIG_REC  ^signal; 

while  (*sim_time  <  MAXTIME)  { 
while  (tmpbeh  !=  MULL)  { 

execute_behavior() ;  /*  execute,  and  post  */ 

remove_behavior() ; 

> 

get_signal() ;  /*  get  from  other  LP  and  post  */ 

update_sim_time(get_low_time()) ;  /*  process  low  time  */ 

while  (signal  =  active_exists(*sim_time))  { 

if  (nnchanged(signal) )  {  /*  compare  values  */ 

remove_8ignal (signal) ; 

> 

else  { 

update_signal( signal) ; 

schedule_behaviors (signal) ;  /*  including  sending  to  other  LPs  */ 
remove. signal ( s ignal) ; 

> 

> 

> 

end_sim() ; 

> 


Figure  21.  Main  VSIM  Simulation  Loop  Modified  for  Parallel  Operation 
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int  lp_own [MAX_BEHAVIORS] ; 


/*  node  location  of  each  behavior  */ 


Figure  22.  Structure  Identifying  LP  Ownership  of  Each  Behavior 

Figure  21  is  also  modified  to  do  a  get_signal()  after  all  behaviors  have  executed  for  a 
given  time.  This  function  calls  lp_get_event()  from  SPECTRUM,  converts  the  event  into  an 
active  record,  and  posts  the  new  record  into  the  active  list.  The  send_signal()  routine  is  called 
from  the  schedule_behaviors()  function.  This  way,  as  behaviors  are  scheduled  on  the  local 
LP,  it  can  check  to  see  which  other  LPs  have  behaviors  dependent  on  the  signal  change.  The 
send_signal()  function,  in  turn,  builds  an  event  out  of  the  signal  change  and  calls  SPECTRUM’S 
lp_post_event ( ) . 

Because  each  LP  must  know  which  behaviors  it  owns,  a  few  modifications  to  VSIM  data 
structures  must  be  made.  VSIM  is  modified  to  read  in  a  mapping  of  behaviors  to  LPs,  and  each  LP 
has  this  information  in  the  array  shown  in  Figure  22.  In  order  to  generate  this  mapping  file,  the 
user  must  determine  the  behavior  numbers  and  dependencies.  To  do  this,  VSIM  is  run  in  sequential 
mode  with  MAPPIIG  defined  in  its  header  file.  The  corresponding  output  is  run  through  a  program 
called  vmap,  which  generates  a  list  of  behavior  numbers,  names,  delays,  and  dependencies.  The 
user  can  then  use  this  data  to  specify  which  behaviors  are  grouped  to  which  LPs.10  The  specific 
LP  to  processor  configuration  is  defined  at  run  time. 

Also,  the  signal  record  structure  is  modified  to  contain  an  “ownership”  flag,  as  shown  in 
Figure  23.  Since  there  is  a  one-to-one  correspondence  between  behaviors  and  their  signal  outputs, 
after  behaviors  are  executed  and  the  corresponding  signals  records  sire  created,  the  ownership  flag 

10This  procedure  is  currently  done  manually,  unless  a  random  assignment  of  behaviors  to  LPs  is  used. 
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typedef  struct  { 
UIIT32  id; 
char  *name; 
unsigned  size:  4; 
UIHT32  cval; 
CONST  *conns; 
BOOL  i_own; 

>  SRREC; 


/*  signal  record  */ 

/*  id  */ 

/*  name  */ 

/*  size  o f  data  value  (bytes)  */ 
/*  current  value  (offset)  */ 

/*  behavior  ail  connections  */ 

/*  for  LP  ownership  */ 


Figure  23.  Basic  structure  for  Signal  Records  Modified  to  Identify  LP  Ownership 

is  set  to  TRUE  for  that  LP.  LPs  are  responsible  to  send  and/or  report  signal  changes  for  those  signals 
that  they  “own.” 

3.8  Summary. 

VHDL  circuits  are  compiled  with  the  Intermetrics  VHDL  toolset,  and  intermediate  C  code 
is  intercepted  and  transformed  to  run  with  AFIT’s  parallel  VHDL  simulator.  VSIM  runs  either 
sequentially  on  a  single  processor,  or  in  parallel  on  the  Intel  iPSC/2  or  iPSC/860  Hypercubes.  For 
parallel  simulations,  VSIM  runs  over  the  SPECTRUM  testbed.  This  allows  various  protocols  to  be 
tested  by  changing  filters  instead  of  making  significant  modifications  to  VSIM.  Behavioral  instances 
are  grouped  into  LPs  and  the  LPs  are  distributed  among  the  Hypercube’s  processors. 

This  chapter  identified  the  key  data  structures,  the  simulation  cycle,  and  the  methodology 
for  breaking  VHDL  simulations  into  multiple  LPs  and  running  on  multiple  processors. 
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IV.  Implementation 


4-1  Introduction. 

This  chapter  describes  the  implementation  of  the  postprocessor  functions  pbuild  and  plex, 
and  the  VSIM  interface  to  the  Intel  Hypercubes  with  SPECTRUM.  Also,  implementation  of  the 
null-message  protocol  using  SPECTRUM  filters  is  discussed.  For  examples  of  some  of  the  key 
source  code  that  realizes  this  implementation,  refer  to  Appendix  G. 


4-2  Postprocessor  Implementation. 

As  shown  in  Table  1,  even  small  VHDL  circuit  simulations  are  composed  of  thousands  or  tens 
of  thousands  of  lines  of  C  code  just  for  circuit  description,  i.e.,  not  including  simulator  code.  Large, 
flat  structural  descriptions  lead  to  very  large  intermediate  files.  It  is  better  to  build  structural 
circuits  hierarchically  and  use  a  number  of  intermediate  configuration  descriptions  than  to  use  one 
overall  configuration  file.  The  multiplier  in  Table  1  is  configured  hierarchically,  while  the  shifters 
are  configured  as  one  large  structural  description.  Even  though  the  multiplier  has  three  times  as 
many  gates  as  the  16-bit  shifter,  the  intermediate  code  is  37%  smaller. 

In  order  to  decrease  the  amount  of  time  required  to  transform  this  code  into  code  compatible 
with  VSIM,  a  program  called  pbuild  is  created  to  automate  this  process. 


Table  1.  Length  of  Intermediate  C  Code  Circuit  Descriptions 


Simulation 

File  Size  (bytes) 

Lines  of  Code 

SR  flip-flop 

1304 

edge-triggered  D  flip-flop 

1964 

full  adder 

61155 

2350 

8-bit  carry  save  adder 

639757 

26651 

8-bit  carry  lookahead  adder 

569576 

23106 

8-bit  ripple  carry  adder 

504540 

20700 

8X8  Wallace  tree  multiplier 

564956 

22032 

16-bit  bit/byte  shifter 

900307 

34192 

32-bit  bit/byte  shifter 

1603124 

59967 
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Pbuild  reads  the  compilation  script  file  generated  during  the  build  phase,  concatenates  the  C 
files  that  make  up  the  specific  simulation,  and  calls  plex,  which  transforms  the  data  by  the  rules 
specified  by  Comeau  (10:4-6)  and  in  this  thesis. 

The  user  must  determine  the  name  of  the  build  script  generated  during  the  build  phase.  This 
can  be  accomplished  by  adding  the  Intermetrics  debug  switch  -debug=cknd  to  the  mg  and  build 
commands.  Then,  after  the  build  phase  completes,  the  script  filename  is  reported.  For  the  example 
of  Figure  24 — a  model-generate  and  build  session  for  an  edge-triggered  D  flip-flop  discussed  in 
Appendix  B.3 — the  compilation  script  is  FH23309. 

An  example  build  script  (for  the  edge-triggered  D  flip-flop)  is  shown  in  Figure  25.  This  script 
is  used  to  generate  an  executable  simulation  called  FH23307,  and  located  in  /home/inter/shiplib 
/tbreeden.  This  directory  represents  where  files  in  the  user’s  work  library  are  located.  The 
intermediate  C  files  required  for  VSIM  are  the  main  (FI23311.c),  and  the  .c  files  that  correspond 
to  the  .o  files  in  the  work  directory.  For  each  .o  file  in  /home/inter/ shiplib/tbreeden  of 
Figure  25,  the  corresponding  .c  filename  is  “two  greater”  than  it’s  .o  file.  For  example,  the  .c  file 
that  corresponds  to  FV23304 . o  is  FK23306 .  c.  The  program  pbuild  reads  this  script,  recognizes  the 
work  library’s  main  and  object  files,  and  concatenates  the  corresponding  main  and  .  c  files,  as  shown 
in  Figure  26.  From  this  point,  pbuild  calls  plex  for  data  transformation.  If  the  specific  path  to  the 
build  script  is  not  specified  by  the  user,  pbuild  can  determine  it  by  getting  the  UNIX  environment 
variables  VHDL.LIBROOT  and  LOGHAME,  which  in  this  case  would  return  /home/inter/shiplib  and 
tbreeden,  respectively.1  This  works  as  long  as  models  are  compiled  and  model  generated  in  the 
user’s  work  directory,  otherwise  the  user  may  have  to  specify  the  complete  path  to  the  build  script 
on  the  command  line  when  invoking  pbuild. 

After  extracting  and  concatenating  the  correct  files,  pbuild  calls  plex  via  the  operating 
system,  also  shown  in  Figure  26.  The  plex  program  was  created  using  C  and  a  UNIX  program 

1The  path  /hows/ intsr/shiplib  is  the  explicit  path  on  lovslac*  in  the  VLSI  lab.  A  logically  equivalent  path  is 
/osr/vtuU/shiplib,  which  works  on  any  machine  in  the  VLSI  lab. 
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lovelace. "/vhdl/etdff>mg  ’-debug=cknd  nand_gate( simple) ’ 

Objects ile  :  /home/ inter/shiplib/tbreeden/FH23067 .o 

H  file  :  /home/inter/shiplib/tbreeden/FH23068 

C  file  :  /home/inter/shiplib/tbreeden/FH23069.c 

Standard  VHDL  1076  Support  Environment  Version  2.1-1  September  1990 

Copyright  (C)  1990  Intermetrics,  Inc.  All  rights  reserved. 

lovelace. "/vhdl/etdff>mg  ’-debug=cknd  three_input_nand_gate(simple) ' 

Object_file  :  /home/inter/shiplib/tbreeden/FH23077.o 

H  file  :  /home/inter/shiplib/tbreeden/FN23078 

C  file  :  /home/inter/shiplib/tbreeden/FN23079.c 

Standard  VHDL  1076  Support  Environment  Version  2.1-1  September  1990 

Copyright  (C)  1990  Intermetrics,  Inc.  All  rights  reserved. 

lovelace. "/vhdl/etdff>mg  ’-debug=cknd  etdff (structural) * 

0bject_f ile  :  /home/inter/shiplib/tbreeden/FI23289.o 

H  file  :  /home/inter/shiplib/tbreeden/FI23290 

C  file  :  /home/inter/shiplib/tbreeden/FI23291.c 

Standard  VHDL  1076  Support  Environment  Version  2.1-1  September  1990 

Copyright  (C)  1990  Intermetrics,  Inc.  All  rights  reserved. 

lovelace. */vhdl/etdff>mg  >-debug=cknd  etdff _test_b«nch(structural) ' 

0bject_file  :  /home/inter/shiplib/tbreeden/FI23299.o 

H  file  :  /home/inter/shiplib/tbreeden/FI23300 

C  file  :  /home/inter/shiplib/tbreeden/FI23301.c 

Standard  VHDL  1076  Support  Environment  Version  2.1-1  September  1990 

Copyright  (C)  1990  Intermetrics,  Inc.  All  rights  reserved. 

lovelace. '/vhdl/etdff>mg  ’-debug=cknd  -top  etdff_conf ig’ 

0bject_file  :  /home/inter/ shiplib/tbreeden/F123304.o 

H  file  :  /home/inter/shiplib/tbreeden/FI23305 

C  file  :  /home/inter/shiplib/tbreeden/F123306.c 

Standard  VHDL  1076  Support  Environment  Version  2.1-1  September  1990 

Copyright  (C)  1990  Intermetrics,  Inc.  All  rights  reserved. 

lovelace. */vhdl/etdff>build  ’-debug=cknd  -replace  -ker=etdff  etdff_config’ 
Kernel  com  file  is  /home/inter/shiplib/tbreeden/FI23309 
Standard  VHDL  1076  Support  Environment  Version  2.1-1  September  1990 
Copyright  (C)  1990  Intermetrics ,  Inc.  All  rights  reserved. 


Figure  24.  Example  VHDL  Model-generate  And  Build  Session  for  an  Edge-triggered  D  Flip-Flop 
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#!/bin/csh 

if  (  $?VHDL_LIBSIM  ==  0  )  then 

if  (  !  -e  /usr/local/lib/libsim.a  )  then 
echo  HOLIB  >  bld_5854cnl.log 
exit  1 
endif 

setenv  VHDL_LIBSIM  -lsim 
else  if  (  !  -e  $VHDL_LIBSIM  )  then 
echo  LIBSM  >  bld_5854cnl.log 
exit  2 
endif 

cc  -g  -o  /home/inter/shiplib/tbreeden/FS23307  \ 
/home/inter/shiplib/tbreeden/FH23311 .  c  \ 

/home/inter/ shiplib/ tbreeden/FI23304. o  \ 

/home/inter/ shiplib/tbreeden/FS23077 . o  \ 

/home/inter/ shiplib/tbreeden/FH23067 . o  \ 

/home/inter/ shiplib/ tbreeden/FH23289 . o  \ 

/home/inter/ shiplib/ tbreeden/FH23299 . o  \ 
/usr/vhdl/shiplib/std/FH240.o  \ 
/usr/vhdl/shiplib/std/FI235 . o  \ 
/usr/vhdl/shiplib/std/FM225 . o  \ 

/usr/vhdl/shiplib/std/FI25 . o 

$VHDL_LIBSIM  -lcurses  -ltermlib  -lm  -lc  >*  bld_B854cnl.log 
exit  $status 


Figure  25.  Example  Compilation  Script  Generated  During  Intermetrics’  Build  Phase 


lovelace . '/vhdl/etdff >  pbnild  FI23309  etdff .c 
cp  /home/inter/shiplib/tbreeden/FI23306.c  big_etdff  .c 
cat  /home/inter/shiplib/tbreeden/FV23079.c  »  big_ etdff .c 
cat  /home/inter/shiplib/tbreeden/FI23069.c  »  big_etdff.c 
cat  /home/inter/shiplib/tbreeden/FI23291.c  »  big_etdff.c 
cat  /home/inter/shiplib/tbreeden/F123301.c  »  big_etdff.c 
cat  /home/ inter/ shiplib/tbreeden/FI23311.c  »  big.etdff.c 
plex  <  big_etdff.c  >  etdff. c 
Transformation  in  progress. . . 


Figure  26.  Result  of  Reading  Compilation  Script  by  pbnild 
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FN23309 
(build  script) 


etdff.c 


Figure  27.  Relationships  of  the  Postprocessor  Files 

called  lex.  Lex  reads  a  specification  file  containing  UNIX  regular  expressions  and  C  routines  that 
are  associated  with  the  regular  expressions.  When  lex  reads  the  file,  character  patterns  are  matched 
by  the  rules  of  the  specified  regular  expressions,  then  C  routines  are  called  that  manipulate  the 
input  file.  For  more  information  on  lex,  see  (23). 

Figure  27  shows  the  necessary  files  and  relationships  among  them  for  the  complete  postpro¬ 
cessor.  The  files  relate  to  the  edge-triggered  D  flip-flop  examples  of  Figures  24,  25,  and  26.  The  user 
only  has  to  invoke  pbuild,  which  controls  the  transformation  process.  The  files  plex_rout ines .  c 
and  stack,  c  are  used  by  plex  to  manipulate  the  data  once  a  regular  expression  has  been  recognized. 

4-2.1  Transformation  Steps.  Pbuild  transforms  the  Intermetrics’ compiler-generated  .  c  files 
into  a  single  .c  file  that,  along  with  the  associated  header  files,  can  be  transferred  to  the  iPSC/2 
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or  iPSC/860  and  run  with  VSIM.  The  following  steps  are  taken  to  transform  the  intermediate  C 


code:2 


1.  The  intermediate  C  code  representing  the  VHDL  configuration  file  is  brought  in  first,  and  it 
contains  all  the  necessary  #include  directives.3  Therefore,  all  #include  directives  after  this 
file  are  removed. 

2.  All  lines  containing  #include  fn26  or  #include  FH26  are  deleted.  The  necessary  header 
information  for  VSIM  simulations  is  combined  in  vsim.h,  which  is  already  on  the  hypercube.4 

3.  All  remaining  #include  directives  are  changed  to  the  proper  path.  For  example,  if  the  path 
was  /home/inter/shiplib/tbreeden/FN2858,  it  is  changed  to  FI28&8. 

4.  All  lines  that  contain  “{trace”  are  changed  to  “{”,  i.e.,  “trace”  to  the  end  of  line  is  deleted. 
VSIM  does  not  support  tracing  capabilities. 

5.  Each  occurrence  of  if  (trceqp)  {  ...  >  is  deleted.  These  if  statements  contain  code  used 
with  Intermetrics  simulator  when  it’s  in  the  “trace”  mode. 

6.  To  complete  the  removal  of  trace-related  statements,  every  line  containing  the  strings  “trace” 
or  “TRAREC”  is  deleted. 


7.  The  last  function  call  from  the  main  routine  is  ZSxxxxxx  (where  xxxxxx  can  be  any  series  of 
numbers  and  letters).  The  statement  “cv  =  init_cv();”  is  inserted  before  the  first  line  in 
this  function.  This  new  function  call  (init_cv())  is  used  to  perform  initialization  functions 
for  the  parallel  simulator.  In  the  third  line  of  the  same  function  the  statement  “sim_it() 
is  added.  This  routine  starts  the  simulation. 


2For  examples  of  how  steps  1  through  10  are  implemented,  see  Figures  4.4  through  4.17  of  Comeau’s  thesis  (10). 

3This  is  not  true  for  structural  models  created  hierarchically  with  a  number  of  configuration  descriptions.  For  such 
models,  the  user  must  add  the  include  directives  to  the  ckt .  c  file.  The  files  to  include  can  be  found  by  examining 
the  big-ckt .  c  file,  or  the  appropriate  .  c  files  reported  during  the  model  generate  phase.  For  more  information,  refer 
to  the  User’s  Guide. 

Previously,  vsin.h  was  simtl.h,  which  is  what  Intermetrics  uses.  The  header  files  are  different,  therefore  the 
filenames  were  changed  to  avoid  confusion. 
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8.  The  mainO  routine  has  six  subroutine  calls.  The  four  in  the  middle  are  deleted.  These 
functions  are  either  not  supported  by  the  current  VHDL  subset,  or  have  been  replaced  by 
init_cv()  and  sim_it()  above. 

9.  The  name  “mainO”  is  changed  to  “vhdl_main().”  With  VSIM,  the  main()  routine  is  found 
in  the  file  vsim.c,  which  calls  vhdl_main()  if  the  simulation  is  run  sequentially.  If  the 
simulation  is  run  in  parallel,  the  address  of  a  startup ()  routine  is  passed  as  the  starting 
address  of  each  LP.  From  there,  startupO  calls  vhdl_main()  in  the  intermediate  C  code. 

10.  For  getting  output  in  the  parallel  environment,  the  name  of  each  signal  must  be  added  to 
the  signal  structure  after  it  is  instantiated.  For  every  mksigO  function  call,  mksigO  either 
returns  a  scalar  or  bit  vector  value,  depending  on  the  type  of  signal. 

•  If  mksigO  is  assigned  to  a  variable  such  as  (*cd)  .  Zxxxxxxx,  it  is  a  scalar  assignment. 
On  the  line  below  the  mksig  string,  “(*cd)  .PARM1  ->  name  =  *(PARM2) is  added 
where  PARM1  is  the  Zxxxxxxx  string  to  which  mksig  is  being  assigned.  PABH2  is  the  first 
parameter  that  appears  in  the  m_ signal  subroutine  call  that  is  six  lines  below. 

•  If  mksigO  is  not  assigned  to  a  scalar,  then  it’s  a  bit  vector  assignment.  Four  lines 
above  these  assignments,  the  statement  “lastsig  =  sigarr  +  IUM1  -  IUM2;”  is  found, 
where  IUH1  and  IUM2  are  integer  values.  Add  “loop.counter  =  IUM1  -  IUH2;”  below 
that  line.  Then,  after  the  line  with  the  mksig  string,  the  following  statements  are  added: 

temp.name  =  (char*)calloc(sizeoi (PARMl)  +  5,  sized (char) ) ; 
sprint! (temp.name ,  H%s(%d)",  PARMl,  loop.counter — ); 

(♦(sigarr  -  1))  ->  name  =  temp.name; 

where  PARMl  is  a  string  which  appear  7  lines  below  as  Z30000xxx . xxxxxxxxxxxxx. 
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11.  Needless  function  calls  are  deleted.  This  was  recommended — but  not  implemented — by 
Comeau  since  his  data  transformations  were  done  by  hand.  Instead  of  deleting  the  calls, 
he  wrote  “dummy”  functions.  The  following  function  calls  are  not  required  and  are  deleted: 


•  close_sigdict() 

•  m_int_type() 

•  m_real_type() 

•  m_real_type() 

•  m_signal() 

•  pop() 

•  pushO 

•  read_input() 

•  rmtrrecO 

•  rptstatsQ 

•  rpterrO 

•  Start_Nonarray_Comp() 

•  schedQ 

•  timer () 

•  tpop() 

12.  Every  behavior  instance’s  “function  behavior”  is  modified  to  report  it’s  entity /architecture 
name  if  MAPPING  is  defined  in  VSIM  and  the  boolean  variable  mapping  is  still  true.  Each 
of  these  function  declarations  is  of  the  form  Zxxxxxxx_xxxx(bi).  Inside  the  function,  after 
local  declarations,  put  the  following: 


#ifdef  MAPPING 
if (mapping) 

printf ( '7.s\n" ,  Zxxxxxxx.xxxx.trcbck) ; 

#endif 


This  step  is  also  new,  and  an  example  is  shown  before  and  after  in  Appendix  F. 
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us 

C  \t]* 

comment 

(\A*)[‘\n]*\n 

include 

#{«s} include [*\n] *\n 

include_fn26 

#{us}include{us}\" [FI] [In] 26\"{us}\n 

trace 

\{{us}trace [*\n] *\n 

il_trceqp 

it {us>\ ({us}trceqp{us}\) {us} 

trc_or_trarec 

[‘\n]*( (trace) 1 (TRAREC)) [“\n]*\n 

main 

main{us}\  (-(us}\)  {us}  [* ;  ] 

Zl.call 

Zi( [0-9] I [A-Z])*{us}\({us}\){us>; 

Z5_f unction 

ZS ( [0-9] I [A-Z])*{us>\({us>\){us>[*;] 

mksig_a 

\(\*cd\) [An] *mksig 

mksig_b 

\n{us}lastsig{us>= 

exec 

\nZ( [0-9] I [A-Z] )*_ [0-9]*{us}\(bi\) 

Figure  28.  Regular  Expressions  Required  to  Identify  Data  to  be  Transformed 

4-2.2  Lex  Descriptions  of  the  Transformation  Steps.  As  each  regular  expression  is  matched 
in  lex,  the  lex  macros  ECHO,  inputO,  output ( ) ,  and  unputO  are  used  in  conjunction  with  a 
character  stack  to  manipulate  the  source  code  according  to  the  rules  above.  The  plex.l  file 
contains  the  lex  description  of  these  rules.  Figure  28  shows  the  regular  expressions  and  Figure  29 
shows  function  calls  used  in  the  lex  description  to  translate  the  data.  These  two  Figures  make 
up  plex.l.  For  example,  the  definitions  of  Figure  28  show  whitespace  (us)  to  be  zero  or  more 
blank  spaces  or  tabs;  a  comment  is  recognized  by  a  \*  to  the  end  of  line  (taking  advantage  of  the 
intermediate  C  code’s  one-line  comments);  and  an  include  directive  is  defined  to  be  a  pound  sign, 
followed  by  white  space,  followed  by  the  word  include,  to  the  end  of  line;  etc.  Then,  the  rules 
in  Figure  29  use  these  definitions  to  recognize  parts  of  the  code  that  require  modification  and  to 
implement  those  modifications. 

The  twelve  steps  of  the  postprocessor  are  accomplished  in  Figure  29  as  follows: 

1.  Step  1.  The  function  ch«ck_include()  is  called  to  remove  the  unnecessary  directives. 

2.  Step  2.  The  # include  fn26  directives  are  deleted  by  not  echoing  them  to  the  output  file. 
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{comment} 


{include_fn26} 


{include} 


{trace} 


{  lineno++; 

numcomment s++ ; 
ECHO; 

} 

{  lineno++; 
output ( ’\n’ ) ; 
num_inc_del++; 

} 

{  check_include() ; 

} 

{  fix_trace(); 


/*  assumes  comments  on  one  line  */ 

/*  count  the  comments  */ 

/*  echo  comment  to  output  */ 

/*  do  nothing,  i.e.,  delete  line  */ 

/*  output  a  neeline  */ 

/*  count  deleted  tindude  fn26  */ 

/*  evaluate  and  modify  include  directives  */ 


/*  {trace 


to  { _ 


{if.trceqp}  {  del_if_trceqp() ; 
{trc_or_trarec}  {  lineno++; 


/*  delete  if (trceqp){. . .}  structures  */ 
/*  do  nothing,  i.e.,  delete  line  */ 


{main} 

{Zl.call} 


{Z5_f unction} 

{mksig_a} 

{mksig_b} 


num_trc_or_trarec++ ; 

} 

{  do_main();  }  /*  adjust  main  function  */ 

{  /*  do  nothing  if  in  main,  i.e.,  don't  ECHO  */ 
if  ( !found_main)  ECHO; 
else  Zl_calls_del++; 

} 

{  do_Z5_function() ;  }  /*  modify  ZSxxxxQ  functions  */ 


{  do_mksig_a() ;  } 
{  do_mksig_b() ; 


/*  modify  bit  mksigO  function  calls  */ 
/*  modify  bit  vector  mksig  calls  */ 


{exec}  {  add_mapping() ;  } 

dose_sigdict{ws}\( 
m_int_type{ws}\( 
m_real_type{ws}\ ( 
m_signal{ws}\( 
pop{w8}\( 
push{ws}\( 
r ead_ input {ws}\( 
rmtrrec{ws}\( 
rptstats{vs}\( 
rpterr{®8}\( 

Start_Ionarray_Comp{ws}\ ( 
sched{ws}\( 
timer{vs}\( 
tpop{us}\( 

\n  {  lineno++; 

ECHO; 


}  /*  add  fifdef  MJiPPIHG  directive  */ 

{  del_fn_call() ;  }  /*  delete  function  calls...  */ 
{  del_fn_call() ;  } 

{  del_fn_call() ;  } 

{  del_fn_call() ;  } 

{  del_fn_call() ;  } 

{  del_fn_call() ;  } 

{  del_fn_call() ;  } 


{  del_fn_call();  } 
{  del_fn_call() ;  } 
{  del_fn_call() ;  } 
{  del_fn_call() ;  } 
{  del_fn_call();  } 
{  del_fn_call();  } 
{  del_fn_call() ;  } 
{  del_fn_call() ;  } 
{  del_fn_call() ;  } 
{  del_fn_call();  } 
{  del_fn_call() ;  } 
{  del_fn_call();  } 


{  ECHO;  } 
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3.  Step  3.  The  paths  of  all  remaining  include  directives  are  modified  in  check. include (). 

4.  Step  4 ■  {trace. . .  is  changed  to  {. . .  by  calling  the  fix.traceO  function. 

5.  Step  5.  Occurrences  of  if  (trceqp)  {. . .}  are  deleted  in  the  del.if.trceqpO  routine. 

6.  Step  6.  Lines  containing  trace  and  TRAREC  are  deleted  by  not  echoing  them  to  the  output 
file. 

7.  Step  7.  The  Z5xxxxxx()  function  is  modified  in  do_Z5_function(). 

8.  Steps  8  and  9.  The  main  function  is  modified  by  do_main(). 

9.  Step  10.  Scalar  signals  are  modified  in  do.mksig.af),  and  bit  vector  signals  are  modified  in 

do_mksig_b(). 

10.  Step  11.  All  unnecessary  function  calls  are  removed  by  calling  del_fn_call(). 

11.  Step  12.  Reporting  of  their  entity /architecture  name  is  added  to  behavior  functions  in 
add_mapping(). 

Finally,  when  the  intermediate  C  code  has  been  completely  transformed,  a  report  is  generated, 
such  as  shown  in  Figure  30. 


4.3  Interfacing  VSIM  with  SPECTRUM. 

SPECTRUM  provides  support  for  running  concurrent  processes  on  the  Intel  iPSC/2  and 
iPSC/860  Hypercubes.  For  VSIM,  the  concurrent  processes  each  run  the  VHDn  simulation  cycle 
as  described  in  Chapter  3.  The  behaviors  are  partitioned  among  the  processes  and  interprocess 
communication  is  accomplished  via  calls  to  SPECTRUM. 

4-3.1  Main  SPECTRUM  Functions.  All  functions  discussed  in  this  section  are  listed  in 
Appendix  G. 

4-3. 1.1  Initialization.  Prior  to  running  a  parallel  simulation  using  SPECTRUM,  the 
number  of  logical  processes  is  specified  in  a  header  file.  When  the  simulation  begins,  a  call  to 
lp_level_init()  is  made  to  establish  the  following: 

•  The  LP  relationships. 

•  The  address  of  the  starting  procedure  for  each  LP. 


49 


Approx  lines: 

2710 

Comments : 

S 

♦include  directives  modified: 

S 

♦include  directives  removed: 

13 

{trace . . .  changed  to  { . . . 

28 

if(trceqp)  tests  removed: 

35 

"trace"  or  "TRAREC"  lines  removed: 

223 

ZlxxxxxxO  calls  removed: 

4 

Z5xxxxxx()  functions  modified: 

1 

Scalar  "mksig"  assignments  modified: 

18 

Bit  vector  "mksig"  assignments 

modified: 

0 

♦ifdef  HAPPIHG  added: 

14 

Other  function  calls  removed: 

close_8igdict() : 

1 

m_int_type() : 

0 

m_real_type() : 

1 

pop() : 

21 

push() : 

21 

read_input() : 

1 

rmtrrecO : 

0 

rptstatsO : 

1 

rpterrO : 

23 

Start_Ionarray_Comp() : 

0 

sched() : 

0 

timerO : 

1 

tpop() : 

31 

Figure  30.  Example  Postprocessor  Report 
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•  The  addresses  of  any  filters  used  by  all  LPs. 


LP  relationships  are  specified  by  the  user  in  a  lp.arcs  file.  The  specifications  for  this  file  are 
found  in  the  SPECTRUM  user’s  guide  (16)  and  in  the  AFIT  Parallel  VHDL  user’s  guide  (4). 

The  function  vspec_init()  builds  a  table  of  function  pointers  for  SPECTRUM.  Each  func¬ 
tion  pointer  represents  the  starting  code  for  the  simulation  on  each  LP.  For  VSIM,  all  LPs  start 
with  the  routine  startupO.  Therefore,  every  entry  in  the  array  functions  □  is  loaded  with  the 
address  of  startupO.  Finally,  a  call  is  made  to  SPECTRUM’S  lp_level_init(),  where  SPEC¬ 
TRUM  initializes  and  each  LP  calls  startupO.  In  turn,  startupO  calls  the  intermediate  C  codes 
vhdl_main() ,  where  the  circuit  to  be  simulated  is  configured,  and  the  simulation  begins  on  each 
LP. 


4- 3. 1.2  Sending  Signal  Changes.  When  VSIM  identifies  a  signal  change  that  is  re¬ 
quired  by  another  LP,  it  uses  a  function  calles  send_signal  to  build  an  event  and  call  SPEC¬ 
TRUM’S  lp_po8t_event () .  SPECTRUM  sends  the  event  to  the  specified  LP  after  the  send  filter 
performs  the  protocol-necessary  functions,  as  discussed  in  the  filter  section. 

4  3.1.3  Receiving  Signal  Changes.  An  LP  receives  a  signal  by  making  a  call  to  SPEC¬ 
TRUM’S  lp_gat_eventO.  The  event  is  then  made  into  a  signal  record  and  posted  in  the  active 
list  by  a  function  called  receive_signal().  If  a  null-pointer  is  returned  from  lp_get_eventO, 
this  indicates  that  no  event  was  ready  to  return  and  the  local  LP  can  safely  execute  without  an 
event  from  another  LP.  This  determination  is  made  by  the  receive  filter. 

4.3. 1.4  Clock  Management.  VSIM  and  SPECTRUM  each  have  local  clocks  for  every 
LP — both  implemented  as  an  integer.  When  an  LP  updates  the  VSIM  clock,  it  passes  this  time  to 
SPECTRUM’S  lp_advanc«_tiffl«()  to  keep  the  clocks  synchronized.5 

*The  SPECTRUM  clock  is  synchronized  with  the  VSIM  dock  in  each  LP.  This  does  not  mean  that  every  LP  has 
the  same  time — only  that  the  VSIM  clock  and  the  SPECTRUM  dock  on  each  LP  has  the  same  time.  The  LPs  run 
asynchronously  by  the  rules  of  the  null-message  protocol. 
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4-3.2  Implementation  of  SPECTRUM  Filters  for  VSIM.  The  filters  are  used  to  implement 
the  null-message  protocol  for  parallel  simulations.  The  theory  behind  this  protocol  is  discussed  in 
Chapters  2  and  3.  The  filters  used  by  VSIM  are  based  on  an  existing  set  of  filters  called  chanclocks, 
established  at  AFIT.  The  receive  filter  is  modified  for  VSIM  to  take  into  consideration  the  two  event 
queues — SPECTRUM’S  input  event  list  and  VSIM’s  active  list. 

Channel  times  are  introduced  to  track  the  safe  time  and  the  output  send  times.  As  discussed 
in  Chapter  3,  safe  time  is  defined  as  the  minimum  input  channel  time  of  all  input  arcs.  Output 
channel  times  are  tracked  to  avoid  sending  null  messages  when  they  are  not  necessary.6 

4-3. 2.1  The  Initialization  Filter.  When  VSIM  calls  lp_init(),  an  initialization  filter 
is  used  to  instantiate  and  initialize  channel  times  for  the  input  and  output  arcs  defined  in  the 
lp .  arcs  file.  Also,  a  null  message  is  sent  to  every  downstream  LP  with  a  time  stamp  of  tip  _jj  el  ay  ■ 

4-3.2. 2  The  Send  Filter.  When  an  LP  sends  another  LP  a  signal,  null_post_f  ilterO 
sends  a  null  message  to  every  other  downstream  LP  with  the  same  time  stamp.  Also,  the  channel 
time  for  each  output  arc  is  updated. 

4-3. 2. 3  The  Receive  Filter.  The  receive  filter,  null_get_fltr(),  is  used  to  get  events 
from  upstream  LPs.  It  determines  if  the  local  LP  is  able  to  prooceed,  i.e.,  at  least  one  message 
has  been  received  from  each  upstream  LP  and  the  time  of  the  next  event  in  SPECTRUM’S  queue 
is  less  than  the  safe  time.  If  so,  the  event  is  valid  (no  event  will  be  received  with  an  earlier  time 
stamp)  and  returned  to  VSIM.7 

If  the  LP  cannot  return  a  message,  it  “peeks”  at  VSIM’s  active  list  to  get  the  next  event  time. 
If  this  time  is  less  than  the  safe  time,  the  filter  returns,  causing  a  NULL  pointer  to  be  returned 

®Since  null  messages  are  only  used  to  avoid  deadlock,  if  a  message  has  been  sent  to  another  LP  at  time  t,  there 
is  no  need  to  (possibly)  send  another  null  message  to  the  same  LP  at  time  t. 

7 Input  null  messages  are  stripped  out. 
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by  lp_get_event() .  When  VSIM  receives  the  NULL  pointer,  it  proceeds  without  adding  a  new 
record  to  the  local  active  list. 

If  the  receive  filter  cannot  return  a  valid  message  and  the  next  event  time  is  greater  than  the 
safe  time,  then  the  filter  blocks  after  sending  a  null  message  to  every  downstream  LP  guaranteeing 
a  message  is  not  sent  any  sooner.  In  this  way,  deadlock  is  avoided.  The  rule,  as  discussed  in 
Chapter  3,  is  an  LP  sends  a  null  message  to  every  downstream  LP  with  a  time  stamp  equal  to 
either  VSIM’s  next  event  time  or  the  sending  LP’s  safe  time  plus  output  delay. 

4-3.3  Termination.  When  an  LP  has  completed  the  simulation,  it  builds  a  null  message  with 
the  maximum  simulation  time  and  sends  it  to  all  downstream  nodes.  Then  it  calls  node.terainate, 
which  signals  to  the  host  that  the  LP’s  simulation  has  completed.  Ideally,  a  terminate  filter  should 
be  used  instead  of  relying  on  the  application  to  create  and  send  a  null  message  and  make  a  node-level 
function  call.8 


8Such  a  filter  now  exists  in  the  latest  version  of  SPECTRUM.  VSIM  uses  this  new  version,  but  it  does  not  use  a 
terminate  filter.  Modification  should  be  relatively  straightforward  and  simple. 


53 


V.  Results 


5.1  Introduction. 

In  this  chapter,  the  performance  of  several  VHDL  circuit  simulations  is  discussed.  First,  three 
small  adders  are  presented:  An  8-bit  carry  save  adder,  an  8-bit  carry  propagate  adder,  and  an  8-bit 
carry  lookahead  adder.  Then,  two  larger  circuits  Eire  simulated:  A  16-bit  bit/byte  shifter  and  an 
8x8  Wallace  Tree  multiplier  with  a  16-bit  product. 

With  the  exception  of  the  16-bit  shifter,  each  circuit  is  compiled  and  run  on  both  the  iPSC/860 
and  the  iPSC/2.  Data  is  presented  separately.  The  shifter  produces  a  C  code  representation  of 
the  configuration  file  that  is  too  large  to  compile  on  the  iPSC/2;  therefore,  only  iPSC/860  data  is 
presented  for  the  shifter.  The  largest  circuit  in  terms  of  numbers  of  gates — the  Wallace  Tree — did 
compile  on  the  iPSC/2  due  to  the  hierarchical  circuit  design  and  use  of  incremental  configurations. 

All  one-LP  simulations  represent  the  entire  circuit  as  a  single  process  on  one  node.  One-LP 
simulations  are  the  baseline  for  speedup  calculations. 

For  performance  measurements,  each  configuration  is  run  30  times  and  averaged.  The  total 
time  for  one  simulation  is  considered  to  be  the  maximum  time  of  all  concurrent  processes.  Unless 
otherwise  noted,  all  output  is  turned  off  and  20  input  vectors  or  sets  of  vectors  are  applied  to  each 
circuit,  e.g.,  20  pairs  of  vectors  are  applied  to  the  multiplier,  20  vectors  are  applied  to  the  shifter, 
etc. 

5.2  Program  Validation. 

Programs  are  validated  by  comparing  them  with  Intermetrics’  output.  The  process  is  as 
follows: 

1.  Run  the  simulation  using  Intermetrics’  simulator. 
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Figure  31.  Sample  Intermetrics  Output  for  Carry  Lookahead  Adder 

2.  Generate  an  Intermetrics  report.  Example  output  for  a  portion  of  the  carry  lookahead  adder 
simulation  is  shown  in  Figure  31.  The  circuit  adds  two  8-bit  vectors,  X  and  Y,  along  with 
a  carry  in,  CIN  (see  the  schematic  on  page  66).  Figure  31  shows  the  values  01010101  and 
01001100  are  applied  to  the  adder  though  X  and  Y  respectively,  while  CIN  remains  a  zero. 
The  sum,  Z,  is  10100001  at  30  ns;  and  the  carry  output,  COTJT,  remains  a  zero.  Also,  X,  Y, 
and  CIN  are  given  new  values  to  begin  another  addition  (whose  result  is  not  shown). 

3.  After  filtering  the  intermediate  C  code  through  the  postprocessor  and  linking  with  VSIM, 
run  the  simulation  in  sequential  mode  under  VSIM.  An  example  of  this  output  for  the  same 
portion  of  the  carry  lookahead  adder  is  shown  in  Figure  32.  Note  the  output  of  VSIM  shows 
only  the  bits  that  have  changed  in  each  bit  vector.  For  example,  at  30  ns  only  bit  5  of  Z 
has  changed  (from  a  zero  to  a  one).  This  output  can  be  directly  mapped  to  the  output  of 
Figure  31. 

4.  Sort  the  output  from  the  VSIM  sequential  run  by  time  and  signal  name,  respectively.1 

5.  Validate  this  output  by  comparing  with  the  Intermetrics  report. 

1  The  output  is  already  in  time  order;  however,  this  Bort  organizes  the  signals  while  maintaining  the  time  order. 
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Figure  32.  Sample  VSIM  Output  for  Carry  Lookahead  Adder 
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6.  Run  the  simulation  in  any  parallel  configuration,  concatenate  the  LP  output  files,  sort  them 

by  time  and  signal  name,  and  use  diff  to  compare  them  with  the  validated  output. 

Input  test  vectors  for  the  adders  are  taken  from  Comeau  (10:5-15).  His  test  vector  patterns 
are  designed  to  verify  the  individual  logic  gates  each  act  correctly  for  all  possible  inputs.  For 
the  shifter,  patterns  are  chosen  to  verify  that  logic  Is  and  Os  shift  left  or  right  one  or  eight  bits, 
depending  on  the  input  control  signals.  Also,  Os  shifted  in  (one  or  eight  bits)  are  verified.  The 
multiplier  is  tested  to  verify  limits  and  various  intermediate  values.  For  example,  input  pairs  (0, 
0),  (0,  number),  (number,  0),  (0,  max),  (max,  0),  (number,  max),  and  (max,  number),  and  several 
combinations  of  (number,  number)  are  tested  and  verified. 

5.3  Circuit  Partitioning. 

No  attempt  is  made  to  find  the  optimal  circuit  partitions;  however,  the  absence  or  presence 
of  speedup  is  discussed  for  each  simulation.  In  general,  larger  or  more  complex  simulations  exhibit 
better  speedup.  Even  though  the  presence  of  feedback  can  significantly  inhibit  performance  in  the 
null  message  protocol,  very  large  circuits  can  still  achieve  speedup  through  parallel  simulation. 

The  full  adders  that  make  up  the  carry  save  and  carry  propagate  adders  are  partitioned  sym¬ 
metrically.  For  eight-LP  simulations,  each  full  adder  is  assigned  to  an  LP,  for  four-LP  simulations, 
two  full  adders  are  assigned  to  each  LP,  etc. 

The  partitioning  for  the  carry  lookahead  adder  is  from  Comeau ’s  research.  This  adder  is 
partitioned  to  avoid  imposing  feedback  among  LPs  and  to  reduce  the  number  of  behaviors  on 
successive  downstream  LPs  (10:5-2). 

Due  to  the  large  number  of  behaviors,  the  16-bit  shifter  and  the  multiplier  are  simulated  with  a 
uniform  random  distribution  of  behaviors  to  LPs.  Even  though  these  circuits  are  combinational  and 
“feedforward,”  such  a  distribution  imposes  feedback  among  the  LPs.  The  results  of  this  research 
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indicate  that  these  larger  circuits  can  be  correctly  simulated;  therefore,  more  aggressive  partitioning 
strategies  can  be  investigated  in  the  future. 

5.4  Explanation  of  Charts. 

For  each  chart,  the  performance  of  the  simulation  time  and  the  total  LP  time  are  presented. 
The  difference  is  that  the  simulation  time  represents  the  total  time  for  an  LP  to  execute  the  core 
simulation  algorithm,  as  presented  in  Figure  21  on  page  36.  The  LP  time  represents  the  total 
time  an  LP  executes,  i.e.,  overhead  is  included  for  SPECTRUM  initialization,  behavior  and  signal 
instantiation,  and  close-out. 

All  data  is  summarized  in  Appendix  E. 

5.5  Circuit  Simulations. 

5.5.1  Carry  Save  Adder.  The  8-bit  carry  save  adder,  shown  in  Figure  33,  is  composed  of 
eight  independent  full  adders.  The  simulation  has  a  total  of  64  behaviors.  Circuit  partitioning  is 
straightforward  due  to  the  lack  of  communication  among  the  full  adders. 

Figure  34  shows  the  performance  and  speedup  of  the  carry  save  adder  for  the  iPSC/2.  Note 
that  the  simulation  loop  exhibits  superlinear  speedup,  i.e.,  speedup  increases  greater  than  the  num¬ 
ber  of  LPs.  This  is  due  to  the  significantly  reduced  search  and  post  times  in  each  active  list,  as 
well  as  the  reduced  number  of  behaviors  executing  on  each  LP. 

In  parallel  simulations,  each  LP  maintains  an  active  list  that  contains  signals  that  only  affect 
behaviors  belonging  to  that  LP.  The  total  number  of  behavior  executions,  and  therefore  the  total 
number  of  signal  records  generated,  is  dynamic.  If  there  Me  m  behaviors  and  n  signal  records  posted 
in  one  circuit  simulation,  a  sequential  simulation  may  be  bound  by  0(n2m)  since  each  signal  change 
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Figure  33.  Schematic  Diagram  of  the  8-bit  Carry  Save  Adder  (10) 

(n)  corresponds  to  up  to  m  behaviors  executing2,  which  then  posts  the  resulting  signal  into  the 
active  record  list  in  O(n)  time. 

If  this  simulation  is  now  divided  evenly  between  two  LPs  that  require  no  communication 
between  them  (trivially  parallel),  then  one  LP  now  has  m'  =  m/2  behaviors,  and  the  total  number 
of  signal  records  generated  and  posted  can  be  estimated  to  be  n'  =  n/2.  The  overall  execution  is 
then  0((n,)2/m')  =  0(n2m/ 8).  This  means  that  the  execution  time  for  a  trtvially  parallel  circuit 
evenly  distributed  between  two  nodes  can  execute  as  much  as  eight  times  faster  as  a  sequential 
simulation  of  the  same  circuit.  Likewise,  trivially  parallel,  balanced  circuits  partitioned  among 
four  and  eight  nodes  can  execute  64  and  512  times  faster,  respectively.  This,  of  course,  is  a  very 
high  bound  on  speedup  estimations  because  the  number  of  generated  signals  is  estimated,  and  the 

2 This  corresponds  to  one  time  through  the  simulation  loop.  This  is  a  very  high  estimation,  as  one  signal  change 
rarely  directly  affects  every  component  of  a  circuit. 
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number  of  behaviors  scheduled  for  execution  due  to  one  signal  change  is  almost  always  significantly 
less  than  the  total  number  of  behaviors. 

Also  Figure  34  shows  the  overall  LP  time  for  the  carry  save  adder  was  never  below  800ms. 
Therefore,  although  each  simulation  loop  was  improving  in  performance,  the  overall  LP  time  is 
bounded  by  SPECTRUM’S  initialization  and  close-out  functions.  As  other  simulations  show,  this 
limitation  disappears  as  circuit  sizes  and  complexities  increase. 

For  the  iPSC/860,  Figure  35  shows  the  computation  times  are  significantly  reduced  as  the 
number  of  LPs  is  increased.  Also,  the  total  execution  time  is  less  for  SPECTRUM  overhead, 
however,  it  increases  with  number  of  LPs  due  to  increased  contention  for  common  resources,  like 
input  files  and  node-to-host  synchronization. 

5.5.2  Carry  Propagate  Adder.  With  the  8-bit  carry  propagate  adder  of  Figure  36,  the  carry 
output  of  each  adder  is  “propagated”  to  the  next  full  adder.  This  introduces  communication  among 
the  LPs.  Otherwise,  partitioning  is  the  same  as  that  of  the  carry  save  adder.  Adjacent  full  adders 
are  assigned  to  the  same  LP  in  order  to  reduce  LP  communications.  The  simulation  of  the  carry 
propagate  adder  consists  of  57  behaviors. 

For  the  iPSC/2,  Figure  37  shows  a  maximum  simulation-loop  speedup  of  about  2.3  for  either 
two  or  four  LP  configurations.  The  total  LP  time  shows  a  speedup  of  about  1.5  for  four  LPs.  At 
eight  LPs,  the  communications  overhead  overcomes  the  computation,  and  no  speedup  is  obtained. 
For  the  iPSC/860  simulations  of  Figure  38,  the  simulation  time  shows  a  modest  1.2  speedup  on 
two  LPs;  however,  the  overall  LP  time  shows  no  speedup  whatsoever.  As  is  the  case  with  the 
carry  save  adder,  the  carry  propagate  adder  is  too  small  to  show  much  promise  of  speedup  on  the 
iPSC/860 — regardless  of  the  addition  of  LP  communication  requirements.3 

3  “Too  small”  can  mean  either  a  small  number  of  components  (behaviors),  or  a  small  number  of  test  vectors,  since 
each  contributes  to  greater  active  lists  and  numbers  of  behaviors  scheduled.  Therefore,  if  the  number  of  input  vectors 
(test  vectors)  were  increased  sufficiently,  the  same  carry  propagate  adder  may  no  longer  be  “too  small.” 
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Figure  35.  Performance  of  the  Carry  Save  Adder  on  the  iPSC/860 
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Figure  36.  Schematic  Diagram  of  the  8-bit  Carry  Propagate  Adder  (10) 

5.5.3  Carry  Lookahead  Adder.  The  8-bit  carry  lookahead  adder  is  made  of  two  4-bit  carry 
lookahead  adders,  as  shown  in  Figure  39.  The  simulation  consists  of  77  behaviors.  For  two-LP 
simulations,  each  4-bit  adder  is  assigned  an  LP.  Four-  and  eight-LP  simulations  are  partitioned  to 
avoid  imposing  feedback,  as  well  as  to  “front  load”  upstream  LPs  with  more  behaviors,  as  shown 
in  Figures  40  and  41.  Partitioning  is  shown  for  only  the  lower  4-bit  adder;  however  it  is  the  same 
for  the  upper  4-bit  adder. 

For  carry  lookahead  adder  simulations  on  the  iPSC/2,  shown  in  Figure  42,  all  multi-LP 
simulations  exhibited  speedup  over  the  one-LP  simulation.  The  best  speedup  for  this  circuit  is 
2.5,  which  occurs  for  the  four-LP  simulations.  This  circuit  is  “larger”  than  the  two  previous 
adders,  and  the  overall  LP  time  more  closely  follows  the  trends  of  the  “inner”  simulation  times.  As 
circuits  continue  to  grow,  this  becomes  more  and  more  apparent.  On  the  iPSC/860,  however,  the 
computation  time  of  the  node  processors  still  overcomes  the  benefits  of  partitioning  the  circuit,  as 
shown  in  Figure  43. 
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Figure  37.  Performance  of  the  Carry  Propagate  Adder  on  the  iPSC/2 
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Figure  38.  Performance  of  the  Carry  Propagate  Adder  on  the  iPSC/860 
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Figure  39.  Schematic  Diagram  of  the  8-bit  Carry  Lookahead  Adder  (10) 
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Figure  41.  Eight-LP  Partition  of  the  Carry  Lookahead  Adder  (lower  Four  Bits  Shown) 
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Figure  42.  Performance  of  the  Carry  Lookahead  Adder  on  the  iPSC/2 
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figure  43.  Performance  of  the  Carry  Lookahead  Adder  on  the  iPSC/860 
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5.5.4  Shifter.  The  16-bit  shifter  shifts  a  16-bit  word  either  one  or  eight  bits  right  or  left 
depending  on  the  control  inputs.  The  schematic  diagram  is  shown  in  Figure  44.  Simulation  of 
this  shifter  contains  309  behaviors.  Therefore,  partitioning  is  accomplished  via  a  uniform  random 
distribution  of  behaviors-to-LPs.  This  means  feedback  is  artificially  introduced  due  to  LP  com¬ 
munication.  Therefore,  performance  in  parallel  configurations  is  not  very  promising,  as  shown  in 
Figure  45.  It  does  demonstrate  that  larger  simulations  can  be  correctly  simulated  in  parallel  on 
the  iPSC/860. 

5.5.5  Multiplier.  The  Wallace  Tree  Multiplier  is  the  largest  circuit  tested.  It  contains  1050 
behaviors.  Schematic  diagrams  and  a  description  of  the  hierarchical  design  are  included  in  Ap¬ 
pendix  D.  Behavior  partitioning  is  once  again  random.  Fortunately,  the  multiplier  was  created 
with  a  hierarchical  set  of  components  and  configuration  descriptions,  and  the  corresponding  C  code 
is  not  too  large  for  either  the  iPSC/2  or  the  iPSC/860. 

Figures  46  and  47  show  multiplier  performance  on  the  iPSC/2  and  iPSC/860,  respectively. 
Both  hypercubes  demonstrate  increasing  speedup  as  the  circuit  is  simulated  on  two  and  then  four 
LPs.  This  is  encouraging  and  somewhat  surprising  since  the  random  partitioning  again  imposes 
feedback  among  LPs.  Because  of  these  results,  greater  performance  improvements  can  be  expected 
for  very  large  circuits  if  partitioning  algorithms  can  be  generated  to  avoid  excess  LP  feedback. 

5.6  Performance  vs.  Test  Vector  Quantity. 

The  carry  lookahead  adder  is  now  modified  to  apply  64  pairs  of  input  vectors  instead  of  20. 
This  corresponds  to  a  larger  initial  active  list,  more  active  records,  and  therefore  more  executions 
of  behaviors.  Figure  48  shows  the  corresponding  speedup  increases  as  the  number  of  LPs  increase. 
The  maximum  speedup  here  is  6.69  for  eight  LPs.  For  the  iPSC/860  of  Figure  49,  speedup  also 
improves,  but  the  maximum  is  3.75  for  four  LPs.  These  trends  were  similar  for  the  other  circuits. 
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Figure  44.  Schematic  diagram  of  the  16-bit  shifter 
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Figure  46.  Performance  of  the  Wallace  Tree  Multiplier  on  the  iPSC/2 
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Figure  47.  Performance  of  the  Wallace  Tree  Multiplier  on  the  iPSC/860 
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Figure  48.  Performance  of  the  Carry  Lookahead  Adder  with  64  Input  Vectors  Applied  (iPSC/2) 
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Figure  49.  Performance  of  the  Carry  Lookahead  Adder  with  64  Input  Vectors  Applied  (iPSC/860) 
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When  VLSI  designers  test  their  circuits  with  a  large  number  of  input  vectors,  they  are  likely 
to  automate  this  procedure  by  creating  test  pattern  generators  in  VHDL.  This  is  not  what  is 
simulated  here.  The  setup  for  VSIM  “hardwires”  the  input  signals,  through  VHDL  source  code,  at 
the  beginning  of  the  simulation.  In  this  way,  the  active  list  is  loaded  with  all  input  signal  changes 
at  the  beginning  of  the  simulation.  With  automatic  test  pattern  generation,  a  behavior  is  created 
to  periodically  generate  an  input  signal  change,  according  to  the  rules  specified  in  the  VHDL  source 
code.  Therefore,  these  signal  changes  posted  to  the  active  list  are  done  throughout  the  simulation, 
and  not  all  at  the  beginning.  Automatic  test  pattern  generation  is  not  implemented  in  this  research 
effort. 

5. 7  Multitasking  LPs  on  one  Physical  Processor. 

The  Intel  80836  processors  of  the  iPSC/2  allow  multiple  processes.  The  carry  lookahead  adder 
and  Wallace  tree  multiplier  were  simulated  on  one  node  with  one,  two,  four,  and  eight  LPs.  Results 
are  shown  in  Figures  50  and  51,  respectively.  Note  that  speedups  of  slightly  more  than  one  are 
achieved  with  two-  and  four-LP  simulations.  These  speedups  are  even  greater  as  the  number  of 
input  vectors  are  increased. 

It  has  already  been  shown  that  one  benefit  of  partitioning  circuits  is  reduced  active  list  search 
and  post  time.  Clearly  in  sequential  simulations,  performance  could  be  improved  if  the  active  list 
search  and  post  time  were  reduced.  This  is  inherently  a  part  of  the  parallel  simulation  paradigm 
for  VSIM.  Improving  the  sequential  algorithm  makes  all  parallel  configurations  run  faster — and  it 
increases  the  challenge  of  achieving  relative  speedup  through  parallelization,  as  is  the  case  with 
using  faster  processors  like  those  used  in  the  iPSC/860. 
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Figure  50.  Performance  of  the  Carry  Lookahead  Adder  with  all  LPs  Run  on  One  Node  (iPSC/2) 
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Figure  51.  Performance  of  the  Wallace  Tree  Multiplier  with  all  LPs  Run  on  One  Node  (iPSC/2) 
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5.8  Performance  with  Output  Enabled. 


All  reported  performance  data  is  with  output  turned  off,  i.e.,  signal  changes  are  not  reported. 
Unfortunately,  VSIM  either  reports  all  signal  changes  or  no  signal  changes.  Commercial  simulators, 
like  Intermetrics  VHDL,  allow  the  user  to  specify  which  signals  to  report. 

With  output  enabled,  each  LP  writes  every  signal  change  to  an  lp.out  file.  Since  the  hyper¬ 
cube  nodes  share  the  file  system  with  each  other  and  the  host,  this  means  much  greater  simulation 
time  for  operating  system  contention  and  file  management.  Execution  time  is  significantly  increased, 
and  file  I/O  overwhelms  the  benefits  of  parallelization. 
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VI.  Conclusions/Recommendations. 


6.1  Research  Summary. 

Many  circuit  designs  are  too  complex  to  be  simulated  with  VHDL  in  a  reasonable  amount 
of  time.  In  an  effort  to  improve  VHDL’s  performance,  an  environment  is  created  to  simulate 
hierarchical  structural  VHDL  circuits  in  parallel  on  Intel  Hypercube  architectures. 

The  output  from  Intermetrics  VHDL  compile  and  model  generate  phases  is  transformed  into 
code  compatible  with  AFIT’s  parallel  simulator.  The  simulator  can  run  sequentially  or  in  parallel 
on  the  Intel  iPSC/2  and  iPSC/i860.  Logic  gates  and  system  behaviors  are  partitioned  among  the 
processors,  and  signal  changes  are  shared  via  event  messages. 

The  transformation  and  parallel  simulation  tools  are  demonstrated  using  three  small  adders: 
an  8-bit  carry  save,  an  8-bit  carry  propagate,  and  an  8-bit  carry  lookahead.  Two  larger  circuits  are 
also  demonstrated:  a  16-bit  bit/byte  shifter  and  an  8x8  Wallace  tree  multiplier. 

No  attempt  is  made  to  find  optimal  partitioning  strategies;  however,  speedups  are  obtained 
for  some  configurations. 

6.2  Conclusions. 

With  the  parallel  VHDL  simulator,  much  research  can  now  be  accomplished  with  respect 
to  partitioning  algorithms,  computation/communication  balancing,  etc.  However,  the  following 
general  observations  can  be  made  about  parallel  simulations  of  structural  VHDL  simulations: 

•  Large  circuits  have  a  better  chance  to  exhibit  speedup.  Large  circuits  mean  more  behaviors. 

More  behaviors  mean  larger  active  lists,  which  contributes  to  increased  computation  on  each 

LP.  However,  a  poor  partition  can  inhibit  speedup  as  larger  active  lists  also  correspond  to 

increased  communications.  If  feedback  is  imposed  among  LPs,  a  great  number  of  null  messages 

are  generated  to  avoid  deadlock.  Increasing  communications  reduces  speedup. 
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•  Balancing  computation  and  communication  times  is  hardware  dependent.  The  node  proces¬ 
sors  of  the  iPSC/2  are  Intel  80386  processors,  while  the  iPSC/860  uses  much  faster  i860 
processors — which  corresponds  to  less  computation  time.  Therefore,  a  good  circuit  partition 
on  the  iPSC/2  may  not  be  as  effective  on  the  iPSC/860. 

•  SPECTRUM  overhead  is  not  a  factor  in  large  circuit  simulations.  It  was  noted,  however, 
that  for  small  simulations,  the  overhead  initializing  SPECTRUM  reduced  the  performance 
of  the  overall  simulation.  For  larger  circuit  simulations,  SPECTRUM  overhead  is  essentially 
constant  regardless  of  circuit  size  or  configuration. 

•  Performance  of  all  simulations  can  be  improved  if  active  list  management  were  improved.  One 
reason  for  obtaining  speedup  was  reduced  active  list  search  and  post  times  due  to  partitioning 
the  behaviors,  and  implicitly,  their  output  signals. 

The  most  important  conclusion  is  large  structural  VHDL  circuits  can  be  simulated  and  run 
with  speedup  on  the  Intel  A?/perc«6es. 

6.3  Recommendations  for  Further  Research. 

6.3.1  Parallel  Simulation  Recommendations.  The  interesting  work  to  be  done  in  the  future 
involves  experimenting  with  the  parallel  simulation  protocols  and  partitioning  algorithms.  Some 
suggested  areas  of  interest  are 

•  Try  various  simulation  protocols.  Since  SPECTRUM  is  now  the  underlying  testbed,  a  number 
of  existing  filters  can  be  examined  for  their  compatibility  with  VSIM. 

•  Create  a  Time  Warp  version  of  VSIM.  Time  Warp  requires  state-saving.  The  state  of  VSIM 
is  identified  by  the  simulation  clock,  the  active  list,  and  the  global  address  space  for  signal 
values.  If  the  address  space  were  more  efficiently  “packed,”  then  saving  state  would  require 
much  less  overhead.  Currently,  each  signal  value  (‘0’  or  T’)  is  inefficiently  stored  in  a  32-bit 
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word  in  memory.  Packing  these  values  aids  in  memory  reduction,  but  may  inhibit  future 
enhancements  to  the  VHDL  subset — such  as  implementing  signals  as  integers  instead  of  bits, 
etc. 

•  Determine  effective  partitioning  strategies.  This  is  the  subject  of  much  research  in  industry 
and  academia.  To  this  extent,  AFIT  has  begun  work  on  a  VHDL  graph  tool  that  reads  the 
VSIM  behavior  numbers  and  relationships,  and  generates  (among  other  things)  behavior-to- 
LP  mapping  files. 

•  Run  simulations  on  larger  parallel  processors.  With  the  automation  of  intermediate  C  code 
translation  and  circuit  partitioning,  much  larger  circuits  can  be  simulated.  Simulating  on 
larger  parallel  processors  will  aid  in  providing  more  concurrency  and  greater  speedup. 

6.3.2  Improving  the  Postprocessor.  Currently,  the  postprocessor  expects  there  to  be  one 
configuration  description  for  each  simulation.  If  configurations  are  broken  into  multiple,  hierarchical 
descriptions,  then  the  corresponding  intermediate  C  code  is  significantly  smaller.  On  page  S5  of 
Appendix  B,  two  ways  to  use  the  postprocessor  on  large  VHDL  circuits  are  discussed: 

•  Run  plex  directly  on  each  C  code  description  generated  in  the  model  generate  phase. 

•  Reconstruct  the  VHDL  circuit  using  hierarchical  configuration  descriptions. 

If  hierarchical  configuration  descriptions  are  used,  then  the  user  must  identify  the  include  files 
by  examining  the  intermediate  code  before  it  is  filtered.  Automation  of  this  function  should  be 
included  as  an  expansion  to  the  postprocessor. 

6.3.3  Expanding  the  VHDL  subset.  The  two  most  beneficial  enhancements  to  the  subset  of 
circuits  that  VSIM  can  simulate  are  resolution  functions  and  wait  statements. 

With  support  of  resolution  functions,  a  vast  number  of  existing  structural  VHDL  circuit 
designs  can  be  acquired  and  tested.  A  suggested  method  for  adding  this  to  the  subset  is 


83 


1.  Create  a  small  circuit  that  uses  a  resolution  function. 


2.  Extract  the  corresponding  intermediate  C  code  representation  of  the  function. 

3.  Identify  external  data  structures  and  function  calls  used  in  the  code. 

4.  Determine  if  VSIM  can  support  the  intermediate  representation. 

5.  If  VSIM  does  not  support  the  intermediate  representation,  design  the  necessary  support 
routines  and/or  data  structures,  using  Intermetrics’  simulator  source  code  as  a  guide. 

This  process  can  also  be  used  to  implement  automatic  test  pattern  generation  and  multi-valued 
logic. 

The  first  step  to  simulating  behavioral  VHDL  circuits  is  implementation  of  wait  statements. 
A  suggested  method  for  adding  wait  statements  is 

1.  Create  simple  processes  with  wait,  wait  lor,  wait  on,  and  wait  until  statements. 

2.  Extract  the  corresponding  intermediate  C  code  and  identify  the  methods  and  data  structures 
as  suggested  for  resolution  functions. 

3.  Using  Intermetrics’  as  a  guide,  build  queues  for  waiting  processes.  If  processes  are  allowed  to 
“wait  on”  events,  then  execution  of  events  that  satisfy  the  wait  condition  can  schedule  the 
waiting  processes  (behaviors). 

6.3.4  Other  Recommendations. 

6.3.4. 1  Considerations  for  Generating  Output.  Change  VSIM  to  report  only  the  signal 
changes  specified  by  the  user.  When  an  Intermetrics  report  is  generated,  only  the  signals  of  interest 
are  reported,  based  on  the  user’s  specification  in  a  “report  control  language”  file.  When  VSIM 
executes  with  output  enabled,  every  signal  change  is  recorded  in  each  LP’s  output  file  (if  the  LP 
“owns”  the  behavior  that  caused  the  signal  change).  Since  the  nodes  of  the  Intel  Hypercubes  share 
the  same  file  system,  this  causes  a  significant  decrease  in  performance  when  output  is  enabled. 
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It  would  be  beneficial  to  specify  only  the  signals  of  interest  for  two  reasons.  First,  it  is  how 
commercial  simulators  handle  output,  as  in  Intermetrics’  case.  Second,  parallel  performance  with 
output  enabled  will  improve  with  the  reduced  file  contention. 

This  can  be  accomplish  a  number  of  ways.  For  instance,  Intermetrics’  report  control  language 
procedures  could  be  studied  and  emulated  in  VSIM.  A  simpler  approach  would  be  to  modify  VSIM 
to  compare  each  signal  name  with  a  list  of  signals  of  interest.  The  list  could  be  built  from  a  user  file 
at  the  start  of  the  simulation.  If  the  changing  signal  is  in  the  user-specified  list,  then  the  change  is 
recorded  in  the  output  file  and  its  new  signal  value  is  updated  in  memory.  Otherwise,  the  change 
is  not  recorded  in  the  output  file;  however,  the  new  signal  value  is  still  updated  in  memory. 

6. 3. 4-2  Design  Method  for  VHDL  Circuits.  Design  circuits  hierarchically,  using  hier¬ 
archical  configuration  files.  Hierarchical  configurations  are  better  for  two  reasons.  First,  as  already 
discussed,  the  corresponding  intermediate  C  code  is  more  likely  to  compile  on  the  hypercubes  with¬ 
out  running  out  of  memory.  Second,  hierarchical  circuit  descriptions  (vs.  large,  flat  descriptions) 
provide  insight  into  possible  circuit  partitionings  by  identifying  groups  of  functionally  related  com¬ 
ponents.  For  example,  a  multiplier  that  uses  sets  of  adders  could  be  partitioned  by  assigning  the 
components  that  make  up  each  adder  to  the  same  LP. 
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Appendix  A.  Definitions 


A.l  Discrete- Event  Digital  Simulation  Definitions. 

The  following  terms  are  used  to  discuss  discrete-event  digital  simulation: 

Component  Any  subsystem  of  a  circuit  that  can  be  modeled  as  an  entity,  regardless  of  the  level 
of  hierarchy.  For  example,  an  AND  gate,  an  arithmetic/logic  unit,  etc. 

Entity  Any  component  in  the  system  which  requires  representation  in  the  model  (2). 

Event  Any  action  that  causes  the  simulation  model  to  change  from  one  state  to  another  (15). 
Typical  events  include  the  changing  of  any  process’s  state  variables,  the  arrival  of  a  message 
at  a  process,  or  the  transmission  of  a  message  from  one  process  to  another. 

Message  State  information  transmitted  among  processes. 

Model  An  abstract  representation  of  a  physical  system  (2).  There  may  be  a  number  of  models 
for  a  given  system.  For  example,  a  digital  circuit  can  be  modeled  by  a  gate-level  schematic 
diagram,  a  block  diagram,  a  dataflow  graph,  etc. 

Process  The  succession  of  states  of  an  entity  over  time  (26:136).  A  logical  process  (LP)  is  the 
model’s  representation  of  a  physical  process  (PP)  in  the  system  (7:198-199).  It  is  common 
to  refer  to  an  entity  as  a  process,  although,  strictly  speaking,  there  is  a  distinction  in  the 
meanings. 

State  A  collection  of  variables  that  describes  the  condition  of  an  entity  or  system  at  any  given 
time  (26:136). 

System  The  real-world  process  to  be  modeled  and  simulated,  e.g.,  an  electronic  circuit  (2). 

A. 2  VHDL  Definitions. 

The  following  VHDL  terms  are  used  in  this  thesis: 
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Architectural  Body  The  description  of  the  internal  behavior  or  structure  of  a  design  entity  (10:2- 
11).  A  structural  description  defines  an  architecture  by  what  its  subcomponents  are  and  how 
the  subcomponents  are  connected  to  each  other  (22:107).  A  behavioral  description  is  used  at 
the  lowest  level  of  decomposition  and  shows  how  the  entity  transforms  inputs  to  outputs  (10:2- 
11).  See  Figure  52  for  an  example  of  the  relationships  among  behavioral  and  structural  circuit 
descriptions. 

Block  A  block  may  be  used  to  define  a  subsystem  of  an  architecture  description  (20).  Blocks  may 
be  nested,  and  they  may  run  concurrently. 

Component  The  building  block  of  hardware  description,  at  any  level  of  hierarchy.  For  example, 
am  AND  gate,  a  register,  a  chip,  or  a  circuit  board  (22:18). 

Design  Entity  The  discrete  system  used  to  model  a  digital  device.  It  defines  the  inputs  and 
outputs  of  a  hardware  design  and  performs  a  well-defined  function  (22:10).  A  design  entity 
may  represent  am  entire  system,  a  sub-system,  a  board,  a  chip,  a  macro-cell,  a  logic  gate,  or 
any  level  of  abstraction  in  between  (10:2-11).  A  design  entity  consists  of  an  entity  declaration 
and  am  architectural  body  (22:10). 

Design  Hierarchy  The  result  of  successive  decomposition  of  a  design  entity  into  components.  It 
also  binds  those  components  to  other  design  entities  that  may  be  decomposed  in  like  mamner. 
Taken  together  they  represent  a  complete  design.  Such  a  collection  of  design  entities  is  cadled 
a  design  hierarchy  (10:2-12). 

Entity  Declaration  The  entity  declaration  defines  the  component’s  interface  to  the  external  en¬ 
vironment;  it  specifies  the  ports  of  the  entity  in  which  data  may  flow  in  and  out  (22:18). 

External  Block  The  top-most  block  in  a  hierarchy.  This  block  is  the  design  entity  itself,  and  it 
defines  the  interface  of  the  design  entity  to  the  external  environment  (10:2-12). 

Inertial  Delay  Delay-type  representing  components  which  require  the  value  on  inputs  to  persist 
for  a  given  time  before  the  component  responds(22:71). 
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Model  The  elaboration  of  the  design  hierarchy  in  the  VHDL  simulation  environment.  The  model 
is  executed  to  simulate  the  behavioral  or  structural  design  of  the  circuit  under  test  (10:2-12). 

Port  A  signal  that  appears  in  the  interface  list  of  an  entity  declaration  (10:2-12).  Also,  a  port 
is  a  component’s  external  interface,  the  point  where  data  flows  into  and  out  of  the  compo¬ 
nent  (22:18). 

Process  A  collection  of  operations  applied  to  signals.  The  operations  are  sequential  descriptions  of 
component  behavior.  Processes  are  said  to  run  concurrently.  Therefore,  VHDL  descriptions 
can  be  thought  of  as  a  set  of  independent  programs  running  in  parallel  (22:9). 

Signal  An  object  that  holds  a  value  and  directly  corresponds  to  some  type  of  metal  interconnection 
within  a  circuit  (10:2-12).  Signals  define  the  pathways  among  processes  (22:9). 

Transport  Delay  Delay-type  representing  an  output  which  always  occurs  regardless  of  the  time 
duration  of  the  input  signals  (22:71). 

Note  that  some  terms,  like  entity,  model,  and  process  have  different  meanings,  depending  on  the 

context — classical  simulation  or  VHDL.  The  reader  is  cautioned  to  interpret  each  term  with  respect 

to  its  context. 
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Figure  52.  Example  of  the  Relationships  Among  Behavioral  and  Structured  Circuit  Descriptions 
in  a  Mixed-Level  Design 
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Appendix  B.  AFIT  Parallel  VHDL  User’s  Guide 


B.l  Overview. 

B.1.1  Introduction.  When  a  VHDL  circuit  is  compiled  with  the  Intermetrics  VHDL  toolset, 
the  intermediate  C  code  must  be  intercepted  and  transformed  to  be  linked  with  AFIT’s  parallel 
VHDL  simulator  (VSIM).  VSIM  runs  sequentially  on  a  single  processor,  or  in  parallel  on  the  Intel 
iPSC/2  and  iPSC/i860  Hypercubes.  For  parallel  simulations,  VSIM  runs  over  SPECTRUM — a 
testbed  that  provides  an  interface  between  user  applications  and  the  parallel  processing  environ¬ 
ment.  The  subset  of  VHDL  circuits  that  can  be  simulated  with  VSIM  includes  structural  descrip¬ 
tions  of  logic  gates  and  simple  processes. 

B.1.2  Process.  The  process  for  developing  and  running  parallel  VHDL  circuit  simulations 
is  as  shown  in  Figure  53.  In  general,  the  following  steps  must  be  taken: 

1.  Write  VHDL  source  code  to  describe  the  circuit  to  be  simulated. 

2.  Compile,  Model  Generate,  and  Build  using  Intermetrics’  VHDL  tools. 

3.  Use  the  postprocessor,  pbuild,  to  generate  C  code  that  can  run  with  VSIM. 

4.  Compile  and  run  the  C  code  with  VSIM  on  a  sequential  processor. 

5.  Use  vmap  to  generate  behavior  id  numbers  and  dependencies. 

6.  Decide  on  partitioning  strategy  and  create  logical  process  (LP)  dependency  file,  lpx.arcs, 
and  behavior-to-LP  mapping  file,  lpx.map. 

7.  Compile  with  VSIM  and  SPECTRUM  on  the  Hypercube  and  run  the  simulation  in  parallel. 


B.l. 3  Related  Files. 

B.l. 3.1  The  Postprocessor.  The  postprocessor,  called  pbuild,  is  used  to  translate 
Intermetrics’  C  code  into  code  compatible  with  VSIM.  The  files  necessary  for  operation  and  main¬ 
tenance  of  pbuild  are  shown  in  Table  2. 

B.  1.3.2  VSIM.  The  AFIT  parallel  VHDL  simulator,  VSIM,  is  comprised  of  two  groups 
of  files.  The  first  group,  listed  in  Table  3,  contains  all  of  the  VSIM-specific  files  required  for 
sequential  operation.  When  the  simulation  is  run  in  the  sequential  mode,  the  executable  filename 
is  generally  the  name  of  the  circuit.  When  the  simulation  is  run  on  a  parallel  machine,  the  files 
of  Table  4  are  also  included,  and  the  executable  file  called  by  the  user  is  generally  called  “host,” 
which  loads  each  node  of  the  hypercube  with  the  appropriate  node  programs. 

B.l. 3. 3  VMAP.  VMAP  is  used  to  determine  the  behavior  id  numbers  and  dependen¬ 
cies.  In  order  to  use  VMAP,  run  the  simulation  in  sequential  mode  with  NAPPIKG  defined  in  vsim.h. 
Then  run  the  output  through  the  program  called  vmap.  The  files  required  for  VMAP  operation 
and  maintenance  are  shown  in  Table  5. 

B.l. 3. 4  Other  Files.  Other  files  related  to  VSIM  simulations  are  listed  in  Table  6. 
These  include  the  source  code  and  headers  for  Intermetrics’  intermediate  C  code,  LP  dependency 
and  mapping  files,  output  files,  and  some  helpful  scripts. 
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Figure  53.  Overview  of  Parallel  Simulation  Session 


Table  2.  Files  Necessary  for  Maintenance  and  Operation  of  the  Postprocessor 


File 

Description 

pbuild 

Executable  called  by  user,  finds  intermediate  C  code  and 
calls  plex. 

plex 

Executable  called  by  pbuild,  uses  lexical  analyzer  and 
regular  expressions  to  find  and  transform  intermediate  C 
code. 

pbuild.c 

Source  code  for  pbuild. 

plex.I 

Lex  description  and  rules  for  pattern  matching. 

plex.h 

Header  file  for  plex.I  and  plexjoutines.c. 

plexjroutines.c 

Routines  called  by  plex.I  to  transform  data. 

stack,  c 

Character  stack  used  by  plexjoutines.c. 

stack,  h 

Header  file  for  stack. c. 

Makefile 

Describes  sequence  of  commands  necessary  for  generating 
executables.  The  command  “make”  generates  pbuild. 

Use  “make  plex”  to  generate  plex. 

Table  3.  Files  Necessary  for  Maintenance  and  Operation  of  VSIM 


File 

Description 

vsim.h 

Header  file  for  vinit.c,  vsim.c,  vtools.c,  and  vspec.c.  Modeled 
after  Intermetrics’  simutl.h. 

vinit.c 

Initialization  routines  for  VSIM. 

vsim.c 

The  main  simulation  loop  and  functions. 

vtools.c 

Tools  provided  for  printing  VSIM  state  variables  and  queues. 
Compilation  is  optional — only  required  for  maintenance 

purposes. 
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Table  4.  Files  Necessary  for  Maintenance  and  Operation  of  Parallel  VHDL  Simulations  using 
SPECTRUM 


File 

Description 

vspec.c 

Contains  the  functions  that  provide  VSIM’s  interface  to 
SPECTRUM. 

vfilt.c 

Contains  the  null-message  protocol  filters.  Modeled  after 
AFIT’s  chanclocks.c. 

UJlull-filt.C 

Table  of  function-pointers  to  filters  in  vfilt.c. 

globals.h 

The  standard  header  file  for  SPECTRUM.  Modified  to 
redefine  event  structure. 

application. h 

Included  by  globals.h,  this  file  contains  application-specific 
global  information  for  SPECTRUM  and  vspec.c.  Most 
importantly,  this  file  is  where  the  number  of  LPs 
are  specified  for  a  particular  simulation. 

lp-man.  c 

Provides  SPECTRUM’S  LP-level  functions. 

cube2.c 

Provides  hardware  interface  for  lp_man.c. 

cube2.h 

Header  file  for  cube2.c  and  host2.c. 

host2.c 

Host  program  used  to  load  nodes  and  start  simulation. 

Table  5.  Files  Necessary  for  Maintenance  and  Operation  of  VMAP 


File 

Description 

vmap 

Executable  used  to  generate  mapping. 

vmap.c 

Source  code  for  vmap. 

list.c 

Linked-list  functions  for  vmap.c. 

Iist.h 

Header  file  for  list.c. 

makefile 

Describes  command  sequence  necessary  for  generation  of  vmap. 

Table  6.  Other  Files 


File 

Description 

plex.log 

Report  generated  by  postprocessor. 

(ckt).c 

oig_(ckt).c 

Postprocessor  output  file,  named  by  the  user  when  invoking 

Big  C  file  containing  intermediate  C  code  prior  to 
transforming  with  plex. 

pbuild.  This  is  the  intermediate  C  code. 

FN* 

Header  files  included  by  (ckt).c. 

lpx.out 

Output  files  for  parallel  simulations.  For  example,  “lp2.out” 
corresponds  to  the  output  of  LP2.  In  sequential  simulations, 
the  output  is  sent  to  “stdout.” 

lpx.arcs 

LP  dependencies  and  output  delays,  generated  by  the  user. 

lpx.map 

Behavior-to-LP  mapping  description,  generated  by  the  user. 

logx 

SPECTRUM  reports  from  each  LP. 

sgrep 

Script  used  to  extract  signal  changes  from  VSIM’s  output  and 
sort  by  time  and  signal  name. 
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#  The  following  setup  Intermetrics ’  VHDL 

set  path  =  (Spath  /usr/vhdl/bin) 

setenv  VHDL_BII  /usr/vhdl/bin 

setenv  VHDL.LIBROOT  /usr/vhdl/shiplib 

setenv  VHDL_COHMOI  /usr/vhdl/common 

setenv  VLS_HELP_FILE  /usr/vhdl/common/help.txt 


Figure  54.  Section  of  .cshrc  File  for  Setting  up  Intermetrics  VHDL  in  the  AFIT  VLSI  Lab 


B.2  Implementation. 

B.2.1  Introduction  This  section  describes  how  to  create  and  run  parallel  VHDL  simulations 
with  VSIM.  The  following  section  illustrates  «,hese  steps  with  an  example. 

B.2. 2  Generating  VHDL  Source  Code.  The  first  step  is  to  create  the  VHDL  circuit  descrip¬ 
tion  in  one  or  more  .  vhd  files.  VSIM  can  simulate  structural  descriptions  of  logic  gates  and  other 
simple  processes.  Circuits  are  created  the  same  way  as  for  Intermetrics’  circuits,  with  the  following 
limitations: 


B.2. 2.1  VHDL  Source  Code  Limitations  for  VSIM.  Signals  can  be  bits  or  bit- vectors, 

but  bit-vector  inputs  must  be  described  one  bit  at  a  time,  e.g.,  Bus(O)  <*  *1*  after  10  ns;. 

Processes  should  be  one-line  descriptions  (Outl  <=  Ini  AID  In2  after  gate_delay ; );  however, 
multiline  processes — delimited  by  begin  and  end  process  may  be  used  provided  they  either  wait 
on  all  signals,  or  the  process  terminates  after  first  use,  i.e.,  it  contains  a  wait;  statement  at  the 
end  of  the  process  block. 

It  is  uncertain  how  functions  and  procedures  will  act  in  VSIM.  For  example,  functions  to 
describe  multi-valued  logic — or  signal  resolution — have  not  been  tested.  Their  implementation 
may  or  may  not  be  trivial;  however,  the  file  vsim.h  would  most  likely  have  to  be  modified  to 
include  the  proper  macros  and  type-definitions.  Intermetrics’  file,  simutl.h,  was  used  as  a  baseline 
for  vsim.h,  with  much  of  the  (at  the  time)  unnecessary  data  removed. 

B.2.S  Setting  up  a  User  Library  for  Circuit  Models.  In  order  to  use  Intermetrics  VHDL 
simulator,  the  following  environment  variables  must  be  defined  in  the  user’s  .  cshrc  file:  VHDL_BII, 
VHDL.LIBROOT,  VHDL.COMMOI,  and  VLS_HELP_FILE.  Intermetrics’  VHDL  is  available  ou  in  the  VLSI 
lab,  and  is  in  the  process  of  being  installed  on  aphrodite  in  the  Parallel  Simulation  Lab.1  The 
correct  environment  setup  for  using  Intermetrics  VHDL  in  the  VLSI  lab  is  shown  in  Figure  54. 

Once  the  correct  environment  variables  are  set,  the  user  creates  a  work  library  by  using 
vis,  define,  and  aakelib,  as  shown  in  Figure  55.3  The  commands  setlib  and  dir  can  be  used 
to  view  the  current  library  and  its  contents,  respectively.  For  the  most  convenience  when  using 
the  postprocessor,  the  user  should  give  the  work  directory  the  same  name  as  his  or  her  userid 
(t-CLOGVAME}). 


1  Also,  harcnlss  on  the  VAX  cluster  has  a  version  of  Intennetrics’  VHDL  simulator. 

^Should  the  error  “YlUMilUKT.KiSEKmOI"  ever  be  raised  by  Intennetrics,  the  only  solution  is  to  delete  the 
complete  user  directory  using  dslsts-nssr. 
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lovelace'/,  vis 

Standard  VHDL  1076  Support  Environment  Version  2.1b  -  1  February  1990 
Copyright  (C)  1990  Intermetrics,  Inc.  All  rights  reserved. 

VLS>makelib  -dir=/usr/vhdl/shiplib/tbreeden  «tbreeden» 
VHDVLS-I-CREATED_LIB  -  Library  «TBREEDEH»  successfully  created. 
VLS>define  work  «tbreeden» 

VLS>setlib  «tbreeden» 

VHDVLS-I-DEFAULT_LIBRARY  -  Default  library  is  «TBREEDEI» . 

VLS>dir 

VHDVLS-T-IO_UHITS  -  Ho  uni  us  found  in  «TBREEDEH». 

VLS>exit 

lovelace'/. 


Figure  55.  Example  Initialization  of  Intermetrics  VHDL 


B.2.4  Compiling,  Model  Generating,  and  Building.  Every  .  vhd  file  is  compiled  individu  illy 
by  using  the  command  vhdl3,  such  as 

vhdl  nand_gate . vhd 

To  “model  generate”  the  specific  entity/ architecture  pairs,  the  command  mg  is  used;  however,  the 
debug  switch  -debug=cknd  is  added  as  so: 

mg  ‘-debug=cknd  nand_gate(simple) ’ 

This  generates  the  required  .c  and  .h  files  for  the  postprocessor.  This  debug  switch  is  also  used 
in  the  “build”  phase,  using  the  command  build: 

build  ‘-debug=cknd  -replace  -ker=etdff  etdff.config' 

In  this  manner,  the  compilation  script  is  generated;  then,  the  postprocessor  can  determine  the 
correct  files  and  their  order  required  for  compilation. 

B.2.5  Extracting  and  Transforming  Intermediate  C  Code.  In  order  to  transform  the  inter¬ 
mediate  C  code  generated  during  the  “model  generate”  phase  above,  type 

pbuild  scriptname  outputname.c 

where  scriptname  is  the  compilation  script  generated  during  the  “build”  phase,  and  outputname.c 
is  the  user’s  name  for  the  transformed  C  file. 

Finally,  send  the  new  ,c  file,  and  all  the  header  files  it  includes,  to  the  target  machine  for 
sequential  (and/or  parallel)  simulation.  The  header  files  are  named  in  the  top  of  the  new  .c  file, 
and  can  be  found  in  the  user’s  work  directory  4 


3The  .vhd  extension  is  optional. 

4  Unless  the  user  has  compiled  them  into  another  directory  through  commands  in  the  VHDL  source  code. 
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B.S.5.1  Handling  Difficult  Files.  When  large  circuit  simulations  are  compiled  under 
Intermetrics,  the  corresponding  C  code  generated  by  the  postprocessor  may  be  too  large  to  compile 
on  the  Intel  Hypercubes.  There  are  two  methods  of  getting  around  this: 

•  Run  plex  directly  on  each  C  code  description  generated  in  the  model  generate  phase. 

•  Reconstruct  the  VHDL  circuit  using  hierarchical  configuration  descriptions. 

If  plex  is  run  directly  on  each  C  code  file,  the  resulting  output  can  be  compiled  into  separate 
object  files  and  linked  together  on  the  Hypercubes.  Currently,  the  16-bit  shifter  on  the  Intel  i860 
is  constructed  in  this  manner.  The  best  way  to  do  this  is  to  first  try  using  pbuild  directly.  If  this 
big  C  file  does  not  compile,  then  run  plex  on  each  intermediate  C  file.  The  “main”  file — found  by 
examining  the  compilation  script — can  either  be  edited  by  hand,  or  can  be  pulled  in  from  the  big 
C  file  generated  by  running  pbuild. 

When  VHDL  structural  circuit  descriptions  are  build  hierarchically,  using  hierarchical  con¬ 
figuration  descriptions,  the  size  of  intermediate  C  code  resulting  from  model  generating  the  overall 
configuration  file  is  significantly  reduced.  For  example,  a  Wallace  Tree  multiplier  was  designed 
in  this  manner.  Even  though  the  multiplier  has  about  20  times  more  logic  gates  than  some  other 
VSIM/VHDL  circuit  descriptions,  the  amount  of  C  code  is  about  the  same.  The  postprocessor  does 
not,  however,  catch  all  of  the  include  directives  necessary  for  compilation.  These  can  be  found  and 
inserted  “by  hand”  by  inspecting  each  intermediate  C  code  representation  of  each  configuration. 

For  example,  here  are  portions  of  the  intermediate  C  code  for  the  Wallace  tree  multiplier  prior 
to  transformation: 

/*  CGF_WALLACE_TB  */ 

# include  "simutl.h" 

# include  "fn26" 

static  char  Z000006B_trcbck  □=  { 

60,  60,  84,  66.  82,  69,  69,  68,  69,  78,  62,  62,  67,  71,  70,  95,  87,  65,  76, 

76,  65,  67,  69,  96,  84,  66,  0  >; 

•include  "/usr/vhdl/shiplib/tbreeden/FI21712" 

•include  "/usr/vhdl/shiplib/tbreeden/F121682" 


/*  CFG_WALLACE_TREE_2  */ 

•include  "simutl.h" 

•include  "ln26" 

static  char  Z0000068_trcbck  □=  { 

60,  60,  84,  66,  82,  69,  69,  68,  69,  78,  62,  62,  67,  70,  71,  95,  87,  65,  76, 
76,  65,  67,  69,  96,  84,  82,  69,  69,  95,  60,  0  >; 

•include  "/usr/vhdl/shiplib/tbreeden/F121682" 

•include  "/usr/vhdl/shiplib/tbreeden/FI21667" 

•include  "/usr/vhdl/shiplib/tbreeden/FV21607" 

•include  "/usr/vhdl/shiplib/tbreeden/FI2665" 
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# include  " /usr/vhdl/ shipl ib/tbr eeden/FH2 1 597 " 


/*  CFG_WALLACE_TREE_ 1  */ 

#include  "simutl.h" 

# include  "*n26" 

static  char  Z0000065_trcbck  □=  { 

60,  60,  84,  66,  82,  69,  69,  68,  69,  78,  62,  62,  67,  70,  71,  95,  87,  65,  76, 
76,  65,  67,  69,  95,  84,  82,  69,  69,  95,  49,  0  >; 

#include  "/usr/vhdl/shiplib/tbreeden/FH21667" 

#include  "/usr/vhdl/shiplib/tbreeden/FI21652" 

#include  "/usr/vhdl/shiplib/tbreeden/FS21622" 

♦include  "/usr/vhdl/shiplib/tbreeden/FS2665" 

#include  "/usr/vhdl/shiplib/tbreeden/FI21607" 

#include  "/usr/vhdl/shiplib/tbreeden/FS21597" 


After  transformation,  only  the  two  include  directives  from  the  top  of  the  first  file  are  included  in 
the  transformed  file: 

/*  CGF_WALLACE_TB  */ 

♦include  "vsim.h" 

static  char  Z000006B_trcbck  □=  { 

60,  60,  84,  66,  82,  69,  69,  68,  69,  78,  62,  62,  67,  71,  70,  95,  87,  65,  76, 

76,  65,  67,  69,  96,  84,  66,  0  >; 

♦include  "FI21712" 

♦include  "FI21682" 


/*  CFG_WALLACE_TREE_2  */ 


static  char  Z0000068_trcbck  □=  { 

60,  60,  84,  66,  82,  69,  69,  68,  69,  78,  62,  62,  67,  70,  71,  95,  87,  65,  76, 
76,  65,  67,  69,  95,  84,  82,  69,  69,  95,  60,  0  >; 


/*  CFG _Vi ALLACE_TKEE_  1  */ 
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static  char  Z0000065_trcbck  □=  { 

60,  60,  84,  66,  82,  69,  69,  68,  69,  78,  62,  62,  67,  70,  71,  95,  87,  65,  76, 
76,  65,  67,  69,  95,  84,  82,  69,  69,  95,  49,  0  >; 


By  examining  the  initial  intermediate  C  code,  the  user  can  then  put  all  of  the  include  directives  in 
the  top  of  the  transformed  file,  as  shown: 

/*  CGF_WALLACE_TB  */ 

# include  "vsim.h" 

static  char  Z000006B_trcbck  []=  { 

60,  60,  84,  66,  82,  69,  69,  68,  69,  78,  62,  62,  67,  71,  70,  95,  87,  65,  76, 

76,  65,  67,  69,  95,  84,  66,  0  >; 

/*  Added  by  TAB,  2  Oct  92  */ 

♦include  "F121712" 

♦include  "FI21682" 

♦include  "FI21667" 

♦include  "FI21607" 

♦include  "FI2665" 

♦include  "FH21597" 

♦include  "FI21652" 

♦include  "FI21622" 

♦include  "FI21637" 

♦include  "FI2635M 
♦include  "FI2645" 


B.2.6  Running  VSIM  on  a  Sequential  Machine.  As  is  the  case  with  Intermetrics’ simulator, 
each  gate  is  dynamically  assigned  a  behavior  number  in  VSIM.  VSIM  must  first  be  run  in  sequential 
mode  in  order  to  see  how  the  behaviors  are  numbered.  To  do  this,  define  MAPPIIG  in  vsia.h.  This 
way,  when  the  simulation  is  run,  VSIM  reports  which  behaviors  are  executing  and  which  behaviors 
are  consequently  scheduled  because  of  that  execution,  i.e.,  dependent  behaviors. 

To  specify  that  the  simulation  is  to  be  sequential,  define  SPARC  in  vsia.h  or  in  the  makefile.5 
Also,  if  signal  change  output  is  desired,  define  OUTPUT  in  vsia.c. 

Now  compile  vinit.c,  vsia.c — and  optionally  vtools.c — with  the  intermediate  C  code 
circuit  description,  and  run  the  simulation. 


5  Although  the  name  is  SPARC,  sequential  simulations  may  be  compiled  and  run  on  the  hypercube  host  or  most 
likely  any  other  machine  with  a  C  compiler,  if  desired. 
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#  LP  index 

#  Humber  of  input  LPs 

#  LP  indices  of  input  LPs 

#  Polling  frequencies  of  input  LPs 

#  Offset  of  polling  frequency 

#  lumber  of  input  lines 

#  LP  number  for  each  input  line 

#  Humber  of  output  LPs 

#  LP  indices  of  output  LPs 

#  Humber  of  output  lines 

#  LP  index  for  each  output  line 

#  Minimum  delays  for  each  output  line 


Figure  56.  Example  Format  for  One  LP  in  an  lpx.arcs  File 


B.  2. 7  Generating  Partitioning  Strategies.  After  running  the  sequential  simulation  with  map¬ 
ping  turned  on,  the  output  can  be  run  through  vmap  to  generate  a  list  of  behaviors  and  dependencies. 
This  step  is  not  necessarily  required;  vmap  was  created  to  generate  an  output  file  that  can  be  used 
in  future  research  related  to  circuit  partitioning  strategies.  If  the  simulator  output  is  in  the  file 
etdff  .ram,  then  type 

vmap  etdff .raw  etdff .map 

to  generate  the  mapping  file,  etdff  .map.  If  a  list  of  signal  changes  is  desired,  the  script  sgrep  is 
provided  to  pull  out  and  sort  the  signal  changes  by  time  and  signal  name  as  so: 

sgrep  etdff .raw  etdff. out 

If  desired,  this  data  can  be  compared  with  the  output  of  Intermetrics’  simulator  in  order  to  check 
for  correctness. 

The  user  must  now  decide  how  to  partition  the  circuit  among  LPs.6  Once  the  partition  is 
determined,  an  lpx.arcs  file  must  be  created  to  define  the  LP  dependencies  and  output  delays. 
SPECTRUM  uses  this  file.  Also,  VSIM  reads  an  lpx.map  file  created  to  map  each  behavior  to  an 
LP.  These  two  files  must  be  created  with  great  care.  VSIM  and  SPECTRUM  assume  the  user  knows 
what  he/she  is  doing,  and  in  most  cases,  they  faithfully  try  to  comply.  The  lpx.arcs  and  lpx.map 
formats  are  shown  in  Figures  56  and  Table  7,  respectively.7 


8The  scheme  for  distributing  LPs  among  processors  is  defined  at  run  time. 

7 The  polling  frequencies  and  offsets  in  the  .arcs  files  are  not  used  with  the  current  filters,  so  zeroes  can  be 
entered.  If  the  number  of  input  or  output  LPs  or  lines  is  zero,  the  other  entries  relating  to  those  LPs  or  lines  are 
omitted.  The  comments  shown  in  Figure  56  and  Table  7  are  not  included.  Although  more  than  one  input  line  is 
permitted  from  each  LP,  communications  from  one  LP  to  another  in  VSIM  can  be  considered  to  occur  on  one  input 
line.  Delays  are  in  femptoseconds. 
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Behavior 

LP  Number 

0 

0 

1 

0 

2 

1 

3 

2 

Table  7.  Example  Format  for  the  lpx.map  File 


B.2.8  Running  VSIM  on  a  Parallel  Machine.  Before  compiling,  be  sure  the  desired  number 
of  LPs  and  the  LP  input  file  is  specified  in  application.h.  If  application. h  is  in  the  user’s 
"/spectrum  directory,  this  can  be  done  by  typing  setlps  x,  where  x  is  the  number  of  LPs  desired.8 

Remove  the  MAPPIIG  and  SPARC  definitions  and  compile  the  intermediate  C  code  with  vinit .  c, 
vsim.c,  vtools.c  (optional),  lp_man.c,  cube2.c,  u_null_f ilt.c,  and  vfilt.c.  This  generates 
the  executable  program  that  is  loaded  on  the  processors  and  represent  each  LP.  Note  that  on  the 
iPSC/2,  more  than  one  LP  may  be  loaded  on  a  processor  due  to  the  multitasking  capabilities  of  the 
Intel  80386  processors.  On  the  iPSC/i860,  however,  the  number  of  LPs  must  match  the  number  of 
processors. 

The  host  program  is  used  to  load  the  LPs  on  the  processors.  It’s  created  by  compiling 
host2.c. 

When  the  necessary  files  are  compiled,  type  host.  Among  other  things,  it  asks  for  the  name 
of  the  program  to  load,  the  number  of  processors  desired,  and  the  number  of  LPs.  The  number  of 
LPs  must  match  the  number  specified  in  application. h.  If  not,  the  program  “bails  out.” 

Each  LP  reports  when  it  is  finished  running,  and  after  every  LP  has  completed,  the  host 
program  reports  time  and  message  statistics.  If  OUTPUT  was  defined  in  vsim.c  and  vspec.c,  the 
output  can  be  found  in  the  group  of  files  labeled  lpx.out,  where  x  is  the  LP  number.  Timing 
information  can  be  found  in  logx,  x  again  being  the  specific  LP  number.  If  DEBUG  or  REPORT  is 
set  to  ‘I’  in  globals.h,  the  logx  files  report  more  information  than  humanly  consumable.  This 
comes  from  lp_man .  c  and  cube2 .  c .  Usually,  filters  also  have  DEBUG  output,  but  this  author  chose 
to  leave  it  out  of  vf  ilt .  c  for  simplicity. 

Finally,  the  lpx.out  files  can  be  concatenated  (provided  OUTPUT  was  defined)  and  sgrep  can 
be  invoked  to  generate  a  file  that  can  be  compared  with  the  sequential  output. 


B.S  Example:  An  Edge-Triggered  D  Flip-Flop. 

B.S.l  Introduction.  This  section  goes  through  an  example  using  the  edge-triggered  D  flip- 
flop  of  Figure  57  on  page  114.  The  VHDL  source  code  is  compiled  and  run  on  a  SPARC  station  in 
the  AFIT  VLSI  lab,  sequential  VSIM  is  run  on  a  SPARC  station  in  the  Parallel  lab,  and  output  is 
compared  to  Intermetrics’  output.  Finally,  parallel  simulations  are  executed  on  the  Intel  iPSC/2 
Hypercube — one  simulation  with  two  LPs  and  no  feedback,  the  other  with  three  LPs  and  feedback 
between  two  of  the  LPs. 

All  figures  referenced  in  this  example  are  located  at  the  end  of  the  document. 


‘The  program  aetlps  simply  changes  any  integer  in  the  first  eight  lines  of  application. b  to  the  specified  integer. 
For  more  than  nine  LPs,  the  user  must  modify  the  file  directly. 
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B.3.2  VHDL  Source  Code.  First,  two-  and  three-input  NAND  gates  are  created,  as  shown 
in  Figure  58  on  page  115.  For  this  example,  these  entity /architecture  descriptions  are  located  in  a 
user  file  called  “nand_nor.  vhd.” 

Next,  the  NAND  gates  are  structurally  connected  to  form  the  edge-triggered  D  flip-flop.  This 
description,  shown  in  Figure  59  (page  116),  is  in  a  file  called  “et_dff  .vhd.” 

To  test  this  circuit,  a  “test  bench”  is  written  to  apply  input  signals  and  receive  output  signals. 
This  file,  et_dff_test_bench.  vhd,  is  shown  in  Figure  60  (page  117),  and  the  schematic  is  shown 
in  Figure  61  on  page  118. 

The  last  VHDL  source  file  is  the  “configuration  file,”  which  structurally  connects  the  compo¬ 
nents,  as  shown  in  Figure  62.  This  file  is  called  et_dff_config.  vhd. 

Intermetrics  uses  a  “report  control  language”  to  generate  a  report  of  desired  signal  changes. 
The  file  for  this  example,  et_dff  .rcl  is  shown  in  Figure  63. 

B.3.3  Compiling,  Model  Generating,  Building,  and  Simulating  under  Intermetncs.  A  script 
like  that  of  Figure  64,  on  page  120,  can  be  run  to  compile,  model  generate,  build,  and  simulate 
the  circuit  with  Intermetrics  VHDL.  Notice  the  placement  of  “-debug=cknd”  in  the  mg  and  build 
phases.  This  generates  the  intermediate  C  code  and  build  script  required  for  the  postprocessor, 
pbuild. 

The  following  is  an  example  session  using  the  script  of  Figure  64: 


lovelace. "/vhdl/et_dff>et_dff 


vhdl  */vhdl/aox_gates/nand_nor . vhd 

Standard  VHDL  1076  Support  Environment  Version  2.1-1  September  1990 
Copyright  (C)  1990  Intermetrics,  Inc.  All  rights  reserved. 


vhdl  et_dff.vhd 

Standard  VHDL  1076  Support  Environment  Version  2.1-1  September  1990 
Copyright  (C)  1990  Intermetrics,  Inc.  All  rights  reserved. 


vhdl  et_dff_test_bench.vhd 

Standard  VHDL  1076  Support  Environment  Version  2.1-1  September  1990 
Copyright  (C)  1990  Intermetrics,  Inc.  All  rights  reserved. 

vhdl  et_dff_conf ig.vhd 

Standard  VHDL  1076  Support  Environment  Version  2.1-1  September  1990 
Copyright  (C)  1990  Intermetrics,  Inc.  All  rights  reserved. 

mg  *-debug=cknd  nand_gate (simple) * 

Standard  VHDL  1076  Support  Environment  Version  2.1-1  September  1990 
Copyright  (C)  1990  Intermetrics,  Inc.  All  rights  reserved. 


Object .file  :  /home/inter/shiplib/tbreeden/FV272.o 
H  file  :  /home/inter/shiplib/tbreeden/FV273 
C  file  :  /home/inter/ shiplib/tbreeden/F1274.c 
mg  *-debug=cknd  three_input_nand_gate(simple) ’ 

Standard  VHDL  1076  Support  Environment  Version  2.1-1  September  1990 
Copyright  (C)  1990  Intermetrics,  Inc.  All  rights  reserved. 
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Object.file  :  /home/inter/shiplib/tbreeden/FH282.o 
H  file  :  /home/int er/shiplib/tbreeden/FH283 

C  file  :  /home/inter/shiplib/tbreeden/FY284.c 

mg  ' -debugs cknd  et_dff (structural) ’ 

Standard  VHDL  1076  Support  Environment  Version  2.1-1  September  1990 
Copyright  (C)  1990  Intermetrics,  Inc.  All  rights  reserved. 

Object_file  :  /home/inter/shiplib/tbreeden/FH2102.o 
H  iile  :  /home/inter/shiplib/tbreeden/FH2103 

C  file  :  /home/inter/ shiplib/tbreeden/FI2104.c 

mg  ’-debug=cknd  et_dff_test_bench(structural) * 

Standard  VHDL  1076  Support  Environment  Version  2.1-1  September  1990 
Copyright  (C)  1990  Intermetrics,  Inc.  All  rights  reserved. 

Object.file  :  /home/inter/ shiplib/tbreeden/FI2112.o 
H  file  :  /home/inter/shiplib/tbreeden/FI2113 

C  file  :  /home/inter/shiplib/tbreeden/FI2114.c 

mg  ’-debug=cknd  -top  et_dff_conf ig’ 

Standard  VHDL  1076  Support  Environment  Version  2.1-1  September  1990 
Copyright  (C)  1990  Intermetrics,  Inc.  All  rights  reserved. 

0b j  ect_f ile  :  /home/int er/shiplib/tbr eeden/FI21 17 .  o 
H  file  :  /home/inter/ shiplib/tbreeden/FI21 18 

C  file  :  /home/int er/shiplib/tbreeden/FI21 19. c 

build  ' -debug =cknd  -replace  -ker=et_dff  et_dff_conf ig’ 

Standard  VHDL  1076  Support  Environment  Version  2.1-1  September  1990 
Copyright  (C)  1990  Intermetrics,  Inc.  All  rights  reserved. 

Kernel  com  file  is  /home/inter/shiplib/tbreeden/FI2122 
sim  et_dff 

Standard  VHDL  1076  Support  Environment  Version  2.1-1  September  1990 
Copyright  (C)  1990  Intermetrics,  Inc.  All  rights  reserved. 

SIGTRAI  Signal  Tracing  turned  on 

QUIESCE  Quiescent  state  reached  vith  no  response  after  512  ns 
rg  et_dff  et_dff.rd 

Standard  VHDL  1076  Support  Environment  Version  2.1-1  September  1990 
Copyright  (C)  1990  Intermetrics,  Inc.  All  rights  reserved. 

lovelace. "/vhdl/et_dff> 


Here  is  the  output  from  Intermetrics’  simulator — found  in  et_dff  .rpt:9 

TIME  I - SIGIAL  IAMES - 

I 

(IS)  I  A  B  CKT_q_0UT  CKT_Q_BAR_0UT 


®The  ♦  values  are  delta  delays  and  can  be  considered  to  have  a  delta  time  value  of  zero. 
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B.3.4  Using  the  Postprocessor  to  Generate  Intermediate  C  Code.  Notice  that  after  the 
model  generate  phase,  Intermetrics  reported  a  “Kernel  com”  file,  FI2122.  This  is  the  compila¬ 
tion  build  script  pbuild  uses  to  build  the  intermediate  C  code:  st_dif .c,  as  shown  below.  The 
report  shown  is  always  written  to  a  file  called  pltx.log. 

lovalacs. '/vhdl/st_dff>pbuild  FI2122  st_dff.c 
cp  /ho»« /  int «r / sh ipl ib/ tbr ««d«n/FI2 1 1 9 .  c  big_«t_dil .  c 
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cat  /home/inter/shiplib/tbreeden/FI284. c  »  big_et_dff.c 
cat  /home/inter/shiplib/tbreeden/FI274 . c  »  big_et_dff . c 
cat  /home/inter/shiplib/tbreeden/FI2104.c  »  big_et_dff.c 
cat  /home/inter/ shiplib/ tbreeden/FH21 14 . c  »  big_et_dff.c 
cat  /home/inter/shiplib/tbreeden/FI2124. c  »  big_et_dff.c 
plex  <  big_et_dff.c  >  et_dff.c 
Transformation  in  prograss . . . 


Approx  linas: 

1964 

Comments : 

5 

# include  directives  modified: 

B 

#include  directives  removed: 

13 

{trace . . .  changed  to  { . . . 

12 

if(trceqp)  tests  removed: 

21 

"trace"  or  "TRAREC"  lines  removed: 

133 

ZlxxxxxxO  calls  removed: 

4 

ZSxxxxxxQ  functions  modified: 

1 

Scalar  "mksig"  assignments  modified: 

10 

Bit  vector  "mksig"  assignments 

modified: 

0 

#ifdef  KAPPIIG  added: 

6 

Other  function  calls  removed: 

close_sigdict() : 

1 

m_int_typeO : 

0 

m_real_type() : 

1 

pop() : 

13 

push() : 

13 

read_ input O : 

1 

rmtrrecO : 

0 

rptstatsO : 

1 

rpterrO : 

23 

Start _Ionarray_Comp() : 

0 

schedQ : 

0 

timer () : 

1 

tpopO: 

31 

In  addition  to  et_dff.c,  copy  all  B  filas  ovar  to  the  iPSC/2. 
lovalace . "/vhdl/et_dff> 


B.S.5  Sequential  Simulation  with  VSIM.  The  intermediate  C  code,  at_dff.c,  and  the 
header  files  it  includes,  F12113,  F1283,  FI273,  and  FI2103  are  now  linked  with  VSIM  and  simu¬ 
lated  on  a  sequential  machine — neptune  in  this  example.  First,  KAPPIIG  is  defined  in  vsim.h  and 
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OUTPUT  is  defined  in  vsim.c  and  vspec.c.10  The  following  makefile  compiles  and  links  for  either 
sequential  simulations  (by  typing  make  vsim)  or  parallel  simulations  (by  typing  make  ipse  for  the 
iPSC/2  or  make  for  the  iPSC/i860): 


#  SPARC  macros 

S_SIMPATH=/olympus3/eng/tbreeden/vsim 
S_CKTPATH=/olympus3/eng/tbreeden/et_dfl 
S_SPECPATH=/olympus3/eng/tbreeden/ spectrum 

S_OBJS=${S_SIMPATH}/vsim.o  ${S_SIMPATB}/vinit .o  ${S_SIKPATH}/vtools.o  \ 
${S_CKTPATH}/et_dtt . o 
S_CFLAGS=-c  -w  -g  -DSPARC 

#  iPSC/2  macros 

I_SIMPATH=/usr2/eng/tbreeden/vsim 

I_CKTPATH=/usr2/eng/tbreeden/et_dff 

MYSPECPATH=/usr2/eng/tbreeden/spectrum 

UVAPATH=/usr/simulate/spectrum/uva 

AFITPATH=/usr/simulate/spectrum/afit 

AFIT_IIC=/usr/simulate/spectrum/af it/include 

FILTERPATH=${MYSPECPATH> 

SPEC HEAD ERS=${MYSPECPATH}/ globals . h  ${HYSPECPATH>/application . h 
IODE_OBJS=${I_SIMPATH>/vsim.o  ${I_SIMPATH>/vinit.o  ${I_SIMPATH>/vtools . o  \ 

${I_SIMPATH}/vspec.o  ${MYSPECPATH}/lp_man.o  ${HYSPECPATH}/cube2 . o  \ 
${HYSPECPATH}/u_null_f  ilt . o  ${HYSPECPATH>/vf ilt . o  \ 

${I_CKTPATH>/et_df 1 . o 

I_CFLAGS=-c  -H 

#  iPSC/i860  macros 
I8_SIHPATH=/usr2/tbreeden/vsim 
I8_CKTPATH=/usr2/tbreeden/et_dff 
MY8SPECPATH=/usr2/tbreeden/spectrum 
UVA8PATH=/usr2/tbreeden/spectrum 
AFIT8PATH=/usr2/tbreeden/spectrum 
AFIT8_IIC=/usr2/tbreeden/ spectrum 
FILTER8PATH=${MY8SPECPATH> 

SPEC8HEADERS=${MY8SPECPATH}/ globals .h  ${MY8SPECPATH}/application.h 
I0DE8_0BJS=${I8_SIHPATH>/vsim.o  ${I8_SIHPATH}/vinit . o  ${I8_SIMPATH}/vtools . o  \ 

${I8_SIMPATH}/vspec.o  ${HY8SPECPATH>/lp_man . o  ${HY8SPECPATH>/cube2 . o  \ 
$W8SPECPATH}/u_nullJtilt.o  |{MY8SPECPATH}/vfilt.o  \ 
${I8_CKTPATH>/et_dll . o 

I860CC=icc 


#  other  macros 
CC=cc 

#  -  iPSC/i860 


10  The  makefile  defines  SPilC. 
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all:  host8  node8 


host8:  ${MY8SPECPATH}/host2 . o 

$(CC)  -o  host  ${MY8SPECPATH}/host2.o  -host 

node8 :  ${I0DE8_0BJS} 

$(I860CC)  -o  et_dfl  ${H0DE8_0BJS>  -node 

${MY8SPECPATH}/host2 . o :  ${MY8SPECPATH}/host2 . c  ${AFIT8_IIC>/cube2 . h 
cd  ${MY8SPECPATH>;  \ 

$(CC)  ${I_CFLAGS>  -I${AFIT8_IIC>  ${MY8SPECPATH}/host2.c 

${I8_SIMPATH}/vs in . o :  ${I8_SIMPATH>/vsim.c  ${I8_SIMPATH}/vsim.h 

${MY8SPECPATH}/applicat ion . h 

cd  ${I8_SIMPATH};  \ 

$(I860CC)  ${I_CFLAGS>  -I${MY8SPECPATH>  vsin.c 

${I8_SIHPATH>/ vinit . o :  ${I8_SIHPATH}/vinit . c  ${I8_SIMPATH}/vsim.h 

${MY8SPECPATH>/appl i cat ion . h 

cd  ${I8_SIMPATH> ;  \ 

$(I860CC)  ${I_CFLAGS}  -I${MY8SPECPATH>  vinit. c 

${I8_SIMPATH>/vtools . o :  ${I8_SIMPATH}/vtools . c  ${I8_SIMPATH}/vsim.h 
cd  $-Cl8_SIMPATH>;  \ 

$(I860CC)  ${I_CFLAGS>  vtools.c 

${I8_SIMPATH>/vapec . o :  ${l8_SIHPATH}/vspec . c  ${18_SlMPATH>/vsim.h 

${MY8SPECPATH>/application . h 

cd  ${I8_SIMPATH};  \ 

$(186000)  $-[I_CFLAGS>  -I${MY8SPECPATH>  vspec.c 

${MY8SPECPATH}/lp_man.o:  ${UVA8PATH}/lp_man . c  ${SPEC8HEADERS> 
cd  ${HY8SPECPATH};  \ 

$(I860CC)  ${I_CFLAGS>  -I${KY8SPECPATH>  ${UVA8PATH}/lp_*an . c 

${HY8SPECPATH}/cnbe2 . o :  ${AFIT8PATH}/cube2 . c  ${AFIT8PATH}/cube2 . c 

${SPEC8HEADERS>  ${AFIT8_IIC>/cube2.h 

cd  ${MY8SPECPATH>;  \ 

$(186000)  ${I_CFLAGS>  ${AFIT8PATH>/cube2 . c 

${HY8SPECPATH}/u_null_f ilt . o :  ${FILTER8PATH}/u_null_f ilt . c 

${SPEC8HEADERS} 

cd  ${MY8SPECPATH>;  \ 

$(186000)  ${I_CFLAGS>  -I${MY8SPECPATH> 

${FILTER8PATH}/ u_null_Y ilt . c 

${HY8SPECPATH}/vf ilt . o :  ${FILTER8PATH>/vf ilt . c  ${SPEC8HEADERS> 
cd  ${MY8SPECPATH>;  \ 

$(186000)  ${I_CFLAGS>  -DVHDL  -I${MY8SPECPATH> 

${FILTER8PATH}/vf ilt . c 
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${I8_CKTPATH}/et_dff .o:  et.dYf.c  ${I8_SIMPATH}/vsim.h 
$(I860CC)  ${I_CFLAGS>  -I${I8_SIMPATH>  et.dff.c 

#  -  iPSC/2  - 

ipse:  host  node 

host:  ${MYSPECPATH}/host2.o 

$(CC)  -o  host  ${MYSPECPATH}/hoat2 . o  -host 

node:  ${N0DE_0BJS> 

$(CC)  -o  et_dfl  ${HODE_OBJS>  -node 

${MYSPECPATH}/host2.o:  ${KYSPECPATH}/host2 . c  ${AFIT_IIC}/cube2.h 
cd  ${MYSPECPATH>;  \ 

$(CC)  ${I_CFLAGS>  -I${AFIT_IHC}  ${MYSPECPATH}/host2.c 


${I_SIMPATH}/vsim.o:  t[I_SIMPATH}/vsim.c  ${I_SIHPATH}/vsim.h 

${MYSPECPATH>/applicat ion . h 

cd  ${I_SIMPATH>;  \ 

$(CC)  ${I_CFLAGS>  -IS{MYSPECPATH>  vsin.c 


${I_SIMPATH}/ vinit . o :  ${I_SIMPATH>/vinit . c  *{I_SIMPATH}/vsim.h 

${MYSPECPATH>/ applicat ion . h 

Cd  ${I_SIMPATH>;  \ 

$(CC)  ${I_CFLAGS>  -I${MYSPECPATH>  vinit. c 

${I_SIMPATH}/ vtools . o :  $-Cl_SIHPATI}/vtoola . c  ${I_SHfPATH>/vsia.h 
cd  ${I_SIMPATH>;  \ 

$(CC)  ${I_CFLAGS>  vtools. c 

l-Cl.SIHPATHj/vspec.o:  |{I_SIMPATH}/vspec . c  ${I_SIMPATH>/vsi».h 

${MYSPECPATH} /application . h 

cd  ${I_SIMPATH>;  \ 

$(CC)  ${I_CFLAGS>  -I${MYSPECPATH>  vspec.c 

${MYSPECPATH}/lp_man . o :  ${UVAPATH>/lp_man . c  ${SPECHEADERS> 
cd  ${MYSPECPATH>;  \ 

$CCC)  ${I_CFLAGS>  -I${MYSPECPATH>  ${OVAPATH>/lp_man . c 

${MYSPECPATH>/cube2.o:  ${AFITPATH}/cube2 . c  ${AFITPATH>/cube2 . c 

${SPECHEADERS>  ${AFIT_IIC>/cube2.h 

cd  #{MYSPECPATH};  \ 

$(CC)  $-Cl_CFLAGS>  -I${AFIT_IIC}  -I${MYSPECPAT1> 
#<AFITPATH>/cube2 . c 

${l!YSPECPATH}/ii_null_f  ilt .  o :  ${FILTERPATH>/n_null_iilt .  c 

${SPECHEADERS> 

cd  ${MYSPECPATH>;  \ 

t(CC)  #{I.CFLAGS>  -I${KYSPECPATH> 

${FILTERPATH}/u_null_f ilt . c 


106 


${MYSPECPATH}/vY ilt . o :  ${FILTERPATH}/vf ilt . c  ${SPECHEADERS> 
cd  ${MYSPECPATH>;  \ 

$(CC)  ${I_CFLAGS>  -DVHDL  -I${MYSPECPATH> 

${FILTERPATH}/vf ilt . c 

${I_CKTPATH}/et_dff .o:  et_d«.c  ${I_SIMPATH>/vsim.h 
$(CC)  ${I_CFLAGS>  -I${I_SIMPATH>  et_d«.c 


#  -  SPARC  - 

vsim:  ${S_OBJS> 

$(CC)  -o  et.dff  -g  ${S_OBJS> 

${S_SIMPATH}/vsim.o:  *{S_SIMPATH}/vsim. c  ${S_SIMPATH}/vsim.h 
cd  ${S_SIMPATH>;  \ 

$(CC)  ${S_CFLAGS>  -I${S_SPECPATH>  vsim.c 

${S_SIMPATH>/vinit.o:  ${S_SIMPATH>/vinit.c  ${S_SIMPATH>/vsim.h 
cd  $-CS_SIMPATH>;  \ 

$(CC)  ${S_CFLAGS>  vinit.c 

*{S_SIMPATH>/vtools.o:  ${S_SIMPATH}/vtools . c  ${S_SIMPATH}/vsin.h 
cd  SfS.SIMPATH};  \ 

$(CC)  ${S_CFLAGS>  vtools . c 

${S_CKTPATH}/st_d«.o:  et.dll.c  *{S_SIMPATH}/vsin.h 
*(CC)  ${S_CFLAGS}  -I${S_SIHPATH>  st.dff.c 


After  compiling,  a  sequential  simulation  may  be  run.  For  this  example,  the  command  is 
•t_dff  >  temp 

The  output,  in  trap,  is  already  in  time  order11;  however,  sgrep  sorts  by  time  and  then  signal  name. 
The  following  command  is  typed: 

sgrsp  trap  et.dff . out 

The  output  is  now  sorted  by  time  and  signal  name,  and  can  be  compared  with  Intermetrics’  output 
for  accuracy.  Using  grsp,  thi  values  for  CKT_Q_OUT  are12 

3  ns,  CKT_Q_0UT  from  0  to  1 
6  ns,  CKT_q_0UT  Iron  1  to  0 
9  ns,  C1CT_Q_0UT  Iron  0  to  1 
12  ns,  CKT_Q_OUT  Iron  1  to  0 
IS  ns,  CKT_q_OUT  Iron  0  to  1 


11  This  is  not  the  case  for  parallel  simulations. 

12 For  complete  accuracy,  every  signal  change  should  be  examined.  Only  one  signal  was  shown  here  for  brevity. 
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18  ns,  CKT_Q_OUT  from  1  to  0 
21  ns,  CKT_q_OUT  from  0  to  1 
24  ns,  CKT_q_OUT  irom  1  to  0 
206  ns,  CKT_q_0UT  from  0  to  1 
359  ns,  CKT_q_0UT  from  1  to  0 
506  ns,  CKT_q_0UT  from  0  to  1 


B.3.6  Extracting  Behavior  Information  using  VMAP.  Since  MAPPIHG  was  defined,  the  out¬ 
put  in  temp  also  has  behavioral  information.  Specifically,  behavior  names,  id  numbers,  and  depen¬ 
dencies,  as  shown  here: 

0  fs,  executing  beh  9:  «TBREEDEH»ET_DFF_TEST_BEHCH  (STRUCTURAL) 


Add 

behav 

1 

to 

active 

list 

at 

0  fs 

Add 

behav 

2 

to 

active 

list 

at 

0  fs 

Add 

behav 

1 

to 

active 

list 

at 

15  ns 

Add 

behav 

2 

to 

active 

list 

at 

15  ns 

Add 

behav 

1 

to 

active 

list 

at 

20  ns 

Add 

behav 

2 

to 

active 

list 

at 

20  ns 

Add 

behav 

1 

to 

active 

list 

at 

50  ns 

Add 

behav 

2 

to 

active 

list 

at 

50  ns 

Add 

behav 

1 

to 

active 

list 

at 

150 

ns 

Add 

behav 

2 

to 

active 

list 

at 

150 

ns 

Add 

behav 

1 

to 

active 

list 

at 

200 

ns 

Add 

behav 

2 

to 

active 

list 

at 

200 

ns 

Add 

behav 

1 

to 

active 

list 

at 

300 

ns 

Add 

behav 

2 

to 

active 

list 

at 

300 

ns 

Add 

behav 

1 

to 

active 

list 

at 

350 

ns 

Add 

behav 

2 

to 

active 

list 

at 

350 

ns 

Add 

behav 

1 

to 

active 

list 

at 

450 

ns 

Add 

behav 

2 

to 

active 

list 

at 

450 

ns 

Add 

behav 

1 

to 

active 

list 

at 

500 

ns 

Add 

behav 

2 

to 

active 

list 

at 

500 

ns 

0  Is,  executing  beh  8:  «TBREEDEI»ET_DFF_TEST_BEICH (STRUCTURAL) 

Add  behav  3  to  active  list  at  0  Is 

Add  behav  3  to  active  list  at  100  ns 

Add  behav  3  to  active  list  at  250  ns 

Add  behav  3  to  active  list  at  400  ns 

0  fs,  executing  beh  7:  «TBREEDEI»ET_DFF( STRUCTURAL) 

0  fs,  executing  beh  6:  «TBREEDEI»ET_DFF(STRUCTURAL) 

0  fs,  executing  beh  5:  «TBREEDEI»IAID_GATE( SIMPLE) 

Add  behav  4  to  active  list  at  3  ns 

Add  behav  7  to  active  list  at  3  ns 

0  fs,  executing  beh  4:  «TBREEDE*»IAID_GATE (SIMPLE) 

Add  behav  5  to  active  list  at  3  ns 

Add  behav  6  to  active  list  at  3  ns 

0  fs,  executing  beh  3:  «TBREEDEI»IA*D_GATE( SIMPLE) 

Add  behav  0  to  active  list  at  3  ns 

Add  behav  2  to  active  list  at  3  ns 

0  fs,  executing  beh  2:  «TB REED EI»THREE_I*PUT_IAID_GATE( SIMPLE) 
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Add  behav  3  to  active  list  at  3  ns 

Add  behav  5  to  active  list  at  3  ns 

0  Is,  executing  beh  1:  «TBREEDEI»IAID_GATE(SIMPLE) 

Add  behav  0  to  active  list  at  3  ns 

Add  behav  2  to  active  list  at  3  ns 

Add  behav  4  to  active  list  at  3  ns 

0  fs,  executing  beh  0:  «TBREEDEI»IAID_GATE( SIMPLE) 

Add  behav  1  to  active  list  at  3  ns 


Using  vmap,  this  information  can  be  filtered  out  of  temp  and  saved.  The  vmap  program 
attempts  to  “guess”  the  delays  of  each  behavior,  based  on  when  dependent  behaviors  are  scheduled. 
The  user  is  given  a  chance  to  override  these  guesses.  In  most  cases,  the  behaviors  which  represent 
gates  show  correct  delays;  the  other  “system”  behaviors  should  be  set  to  a  delay  of  zero.  Here  is 
how  vmap  is  used  for  this  example: 

neptune : '/et_dlf >vmap  temp  et.dff.map 
Collecting  behavior  names  and  delays . . . 

ET_DFF_TEST_BEICH ( STRUCTURAL )  Delay  =  0 
ET_DFF ( STRUCTURAL)  Delay  =  3000000 
IAVD_GATE(SIMPLE)  Delay  =  3000000 
THREE_IIPUT_IAHD_GATE (SIMPLE)  Delay  =  3000000 
Change  delays?  y 

ET_DFF_TEST_BE1 CH ( STRUCTURAL )  Delay  =  0 
Change  delay?  n 

ET.DFF ( STRUCTURAL)  Delay  =  3000000 
Change  delay?  y 

Enter  new  delay:  0 

IAID_GATE(SIMPLE)  Delay  =  3000000 
Change  delay?  n 

THREE_IIPUT_IA«D_GATE (SIMPLE)  Delay  =  3000000 
Change  delay?  n 

ET_DFF_TEST_BEICH ( STRUCTURAL )  Delay  =  0 
ET_DFF ( STRUCTURAL)  Delay  =  0 
*AID_GATE (SIMPLE)  Delay  =  3 
THREE_IIPUT_IAID_GATE (SIMPLE)  Delay  =  3000000 
Change  delays?  n 


Output  written  to  et.dff .map 

neptune : */et_dff >  more  et.dff.map 
9  ET_DFF_TEST_BEICH ( STRUCTURAL )  012 
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8  ET_DFF_TEST_BEICH (STRUCTURAL)  0  3 
7  ET_DFF ( STRUCTURAL )  0 
6  ET_DFF ( STRUCTURAL )  0 
S  IAID_GATE(SIMPLE)  3000000  4  7 
4  I AID.GATE (SIMPLE)  3000000  5  6 
3  HAID_GATE (SIMPLE)  3000000  0  2 
2  THREE_IIPUT_IAn)_GATE (SIMPLE)  3000000  3  5 
1  IAID_G ATE ( SIMPLE )  3000000  024 
0  MAID.GATE (SIMPLE)  3000000  1 
neptune : */et_df f  > 


The  format  for  et_d:ff  .out,  shown  above,  is 

{behavior Jd  behavior_name  delay  {dependent_behaviors}o+  newline}i+ 

Currently,  the  only  way  to  map  behavior  numbers  to  behaviors  is  to  compare  the  output  of 
either  VSIM  or  vmap  to  the  schematic.  For  the  edge-triggered  D  flip-flop,  this  is  shown  in  Figure  65 
on  page  120. 

5.3.7  Generating  .arcs  and  .map  Files  for  Partitioning. 

B.3.7.1  A  1-LP  Configuration  The  whole  circuit  can  be  simulated  as  one  LP.  This 
configuration  can  be  used  to  compare  timing  data,  etc.,  with  other  configurations.  An  lxl  .map  file 
is  not  required;  however,  an  lpl.axcs  file  is  required  and  is  written  as  so: 

0 

0 

0 

0 

0 


B.3. 7.2  A  2-LP  Configuration.  The  first  configuration  to  be  tested  has  2  LPs.  LP0 
contains  behaviors  0,  1,  2,  3,  8,  and  9.  LP1  contains  behaviors  4,  5,  6,  and  7.  See  Figure  66  on 
page  121.  The  arcs  file  that  SPECTRUM  uses  is  lp2.arcs,  and  it  contains  the  following  mapping: 


0 

0 

0 

1 

1 

1 

1 

3000000 


1 

1 
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0 

0 

0 

i 

0 

0 


The  map  file  is  for  VSIM  to  identify  which  LPs  “own”  which  behaviors.  VSIM  always  expects  this 
filename  to  be  “lpx.map”,  where  the  number  of  LPs  replaces  x.  Therefore,  lp2.map  is  written  as 
follows: 

0  0 
1  0 
2  0 

3  0 

4  1 

5  1 

6  1 

7  1 

8  0 
8  0 


B.3.7.S  A  3-LP  Configuration  with  Feedback.  This  configuration,  shown  in  Figure  67 
(page  121),  is  used  to  demonstrate  VSIMs  capability  to  handle  feedback  among  LPs.  The  .arcs 
file  is  lp3.arcs,  and  contains  the  following: 

0 

0 

0 

2 

1  2 
2 

1  2 

3000000  3000000 


2 

0 

0 


1 
2 
0 
0 
0 
2 

0  2 
1 
2 
1 
2 

3000000 


111 


1 

0 

0 


2 
2 
0 
0 
0 
2 

0  1 
1 
1 
1 
1 

3000000 


Then,  lp3.map  is  written  as  follows: 

0  0 
1  0 
2  0 

3  0 

4  1 

5  2 

6  1 

7  2 

8  0 
9  0 


B.3.8  Parallel  Simulation.  Prior  to  simulating  in  parallel,  MAPPIIG  is  turned  off  in  vsim.h. 
This  is  not  a  requirement,  but  mapping  information  is  no  longer  needed.  Prior  to  changing  the 
number  of  LPs  for  any  simulation,  application. h  is  modified,  using  setlps,  to  define  lUN.PROCS 
and  IIPUT_ARCS,  the  number  of  LPs  and  the  .arcs  filename,  respectively.  The  intermediate  C 
code,  its  header  files,  and  the  lpz.arcs  and  lpz . nap  files  are  sent  to  the  hypercube  and  compiled 
each  time  the  number  of  LPs  is  changed. 

B.3.8. 1  Simulating  the  Edge-Triggered  D  Flip-Flop  as  one  LP.  The  number  of  LPs 
is  set  to  1  and  the  same  makefile  is  used  (this  time  typing  “make”)  to  compile.  The  simulation  is 
started  by  typing  host.  Here  is  an  example: 

c386  8: host 

Which  application  do  you  vant  to  use?:st_dff 
Enter  the  command  line  arguments  lor  the  program 
> 

Is  assignment  o 1  logical  processes  to  nodes  to  be  from  a  file?  (y/n)  ->  n 
Hon  many  cube  nodes  do  you  want  to  use?:l 
Bom  many  LP's  are  in  this  application?:! 

Do  you  want  to  use  the  ’natural'  node  assignment?  (y/n):  y 
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Getting  cube  of  size  1  -  stand  by. 
load  -H  -p  0  0  et_dff 
start cube 
Cube  Loaded 

LAST_TIME  message  from  LP  0  on  node  0,  pid  0. 

End  stats  messages: 

LP  0  (node  0,  pid  0):  0  received,  0  sent. 

Mar  message  count  set  at  10,  Max  messages  removed  was  0. 
HOST:  Total  CPU  time  waiting:  0.000000  (msecs) 

HOST:  Wall  clock  time  loading  cube:  7  (secs) 

HOST:  Hall  clock  time  waiting:  4  (secs) 
c386  9: 


Now,  the  output  is  found  in  lpl.out  and  can  be  compared  to  etdff  .out,  which  the  previously 
verified  output.  Also,  an  LP  report  file,  logO  is  generated  by  SPECTRUM  with  the  following 
information: 

LP  0  wall  time  taken  is  4.194  (secs) 

LP  0  messages  received  0 
LP  0  messages  sent  0 


B.3.8.2  Simulating  the  Edge-Triggered  D  Flip-Flop  as  more  than  one  LP.  The  process 
is  the  same  as  for  one  LP;  however,  the  output  is  combined  in  the  lpx .  out  files.  For  example,  the 
two  LP  configuration’s  output  is  found  in  lpO.out  and  lpl.out.  These  two  files  are  concatenated 
and  sgrep  is  used  to  sort  them.  The  output  from  sgrep  is  verified  against  et_df f .  out.  The  results 
of  all  1,  2,  and  3  LP  configurations  are  shown  in  Table  8. 1314 

B.3.9  Summary.  This  guide  demonstrates  how  to  compile  a  VHDL  circuit  with  the  Inter- 
metrics  VHDL  toolset,  intercept  the  intermediate  C  code,  and  compile  and  link  with  AFIT’s  parallel 
VHDL  simulator  (VSIM).  VSIM  simulations  of  an  edge-triggered  D  flip-flop  are  demonstrated  for 
a  single  processor  and  in  parallel  on  the  Intel  iPSC/2  Hypercube. 


13  These  results  were  run  one  time  for  each  configuration,  and  are  for  comparison  purposes.  If  statistics  are  required, 
more  runs  would  have  to  be  made. 

14  For  the  3  LP/2  node  configuration,  LP0  was  loaded  on  node  0,  and  LPs  1  and  2  were  loaded  on  node  1. 
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entity  HAID.GATE  is 

generic  (gate_delay:  TINE  :  =  3  ns); 
port  (I*_1,II_2:  in  BIT  :=  'O’; 
OUT_l :  ont  BIT  :=  '0'); 
end  IAID.GATE; 


architecture  SIMPLE  of  IAID.GATE  is 
begin 

0UT_1  <=  II_1  nand  IS_2  after  gate.delay  ; 
end  SIMPLE  ; 


entity  THREE.IIPUT.MAID.GATE  is 

generic  (gate.delay:  TIME  :=  3  ns); 
port  (II_1.I*_2,IM_3:  in  BIT  :=  'O'; 
OUT.l:  out  BIT  :=  '0'); 
end  THREE. IIPUT_IAID_GATE ; 


architecture  SIMPLE  of  THREE. IIPUT_IAID_GATE  is 
begin 

0UT_1  <=  not  (II_1  and  II_2  and  II_3) 
after  gate.delay  ; 
end  SIMPLE  ; 


Figure  58.  VHDL  Descriptions  of  Two-  and  Three-Input  NAND  Gates 


—  Lt  T.  Andy  Breeden,  GCE-92D,  4  Aug  92 

—  Edge-Triggered  D  Flip-Flop  (structural) 


entity  ET_DFF  is 

port  (D,CP:  in  Bit; 

Q:  out  Bit; 
QJBar:  out  Bit); 

begin 

end  ET.DFF; 


architecture  Structural  of  ETJDFF  is 

component  A_IAID_Gate 

port  (In_l,  In_2:  in  Bit; 

Out_l:  out  Bit); 
end  component; 

component  A_3Input_IAlD_Gate 

port  (In_l,  In_2,  In_3:  in  Bit;  Out.l:  Out  Bit); 
end  component; 

signal  Xl_Out,  X2_0ut,  I3_0ut ,  X4_0ut:  Bit; 
signal  X5_0ut,  X6_0ut:  Bit; 

begin 

XI:  A_IAID_Gate  port  map  (X4_0ut,X2_0ut,Xl_0ut) ; 

X2:  A_IAID_Gate  port  map  (Xl_0ut,CP,X2_0ut); 

X3:  A_3Input_*AID_Gate  port  map  (X2_0ut,CP,X4_0ut,X3_0ut); 
X4:  A_IAID_Gate  port  map  (X3_0ut,D,X4_0ut); 

X5:  A_IAID_Gate  port  map  (X2_0ut,X6_0ut,X5_0ut); 

X6:  A.IAID.Gate  port  map  (XS_0ut,X3_0ut,X8_0ut); 
q  <=  X6_0ut; 

Q_Bar  <=  X6_0ut; 
end  Structural; 


Figure  59.  Structural  VHDL  Description  of  Edge-triggered  D  Flip-flop 
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—  Test  Bench  for  Edge-Triggered  D  Flip-Flop 

—  Lt  T.  Andy  Breeden,  GCE-92D,  4  Aug  92 


entity  ET_DFF_Test_Bench  is 
end  ET_DFF_Test_Bench; 

architecture  Structural  of  ET_DFF_Test_Bench  is 

component  Test.Circuit 

port  (D,CP:  in  Bit; 

Q,Q_Bar:  out  Bit); 
end  component; 

signal  a,b,Ckt_Q_0ut,Ckt_t)_Bar_0ut:  Bit; 
begin 

Circuit:  Test_Circuit  port  map  (a,  b, 

Ckt_Q_Out,  Ckt_Q_Bar_Out) ; 

a  <=  ’0’  after  0  ns,  ’1’  after  100  ns, 

’O'  after  260  ns,  *1’  after  400  ns; 

b  <—  *0*  after  0  ns,  ’1’  after  16  ns,  ’0’  after  20  ns, 

’1’  after  60  ns,  *0’  after  160  ns,  '1*  after  200  ns, 
’O’  after  300  ns,  *1’  after  360  ns,  ’0'  after  460  ns, 
'1*  after  600  ns; 


end  Structural; 


Figure  60.  VHDL  Description  of  Test  Bench  for  Edge-triggered  D  Flip-flop 
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Test  Bench 

Figure  61.  Schematic  of  Test  Bench  for  Edge-triggered  D  Flip-flop 
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Ckt_Q_Bar_Out 


—  Configuration  file  to  connect  Edge-Triggered 

—  DFF  to  test  bench. 


—  Lt  T.  Andy  Breeden,  GCE-92D,  4  Aug  92 


Library  work; 
use  work. all; 

configuration  ET_DFF_Conf ig  of  ET_DFF_Test_Bench  is 
for  Structural 

for  Circuit:  Test_Circuit 

use  entity  work.ET_DFF(Structural) ; 
for  Structural 

for  all:  A_IAHD_Gate 

use  entity  work.IAID_GATE(Simple) ; 
end  for; 

for  all:  A_3Input_IAHD_Gate 

use  entity  work.Three_Input_IAHD_Gate(Simple) ; 
end  for; 
end  for; 
end  for; 
end  for; 

end  ET_DFF_Conf ig ; 


Figure  62.  VHDL  Description  of  Configuration  File  for  Edge-triggered  D  Flip-flop 


—  Output  for  Edge-Triggered  DFF  simulation  using 

—  Intermetrics’  Report  Control  Language  (RCL) 

—  Lt  T.  Andy  Breeden,  GCE-92-D,  4  Aug  92 


simulation_report  ET_DFF_Sim  is 
begin 

select.signal :  a , b , Ckt_Q_0ut , Ckt  JJ_Bar_0ut ; 
sample.signals  by_event  in  ns; 
end; 


Figure  63.  VHDL  Report  Description  for  Edge-triggered  D  Flip-Flop 
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#!/bin/csh  -v 

vhdl  */vhdl/aox_gates/nand_nor 

vhdl  et_dif . vhd 

vhdl  et_dff_test_bench 

vhdl  et_dff_conlig 

mg  ’-debug=cknd  nand_gate(simple)  ’ 

mg  ’-debug=cknd  three_input_nand_gate( simple) ' 

mg  ' -debug=cknd  et_dfi (structural) * 

mg  ’ -debug=cknd  et_dlf_test_bench(structural) * 

mg  ' -debugs cknd  -top  et_dii_conlig’ 

build  ’ -debug=cknd  -replace  -ker=et_df f  et_dif_coniig’ 
sim  et_dff 

rg  et_dll  et_dff.rcl 


Figure  64.  Shell  Script  for  Compiling,  Model  Generating,  Building,  and  Simulating  the  Edge- 
triggered  D  Flip-flop  using  Intermetrics’  Simulator 


Figure  65.  Edge-Triggered  D  Flip-flop  Labeled  with  Behavior  Id  Numbers 
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Figure  66.  Edge-Triggered  D  Flip-flop  Partitioned  Into  2  LPs 


Figure  67.  Edge-Triggered  D  Flip-flop  Partitioned  Into  3  LPs 
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Appendix  C.  Subset  of  VHDL  Source  Code  for  Parallel  Simulation 


The  subset  of  circuits  that  can  be  simulated  with  VSIM  includes  hierarchical  structural  de¬ 
scriptions  of  logic  gates.  This  appendix  discusses  the  subset  and  syntax  for  logic  gates,  structural 
connections,  the  test  bench,  and  configurations. 

C.l  Logic  Gates. 

Logic  gates  are  designed  as  entity/architecture  pairs.  Input  and  output  signals  for  logic  gates 
must  be  of  type  Bit.  The  number  of  inputs  and  outputs  is  not  restricted.  Default  values  may  be 
assigned.  Gate  delays  are  of  type  time,  and  may  be  constants  or  generics.  Processes  may  use  wait 
statements  only  if  they  wait  on  all  inputs.  Logical  operators  and,  or,  nand,  nor,  and  xor  may  be 
used.  The  adding  operator  (+)  may  be  used  to  add  values  of  type  time.  Here  are  some  examples 
of  acceptable  logic  gate  descriptions: 

entity  AID_GATE  is 

generic  (gate.delay:  TIME  :=  3  ns); 
port  (II_1 ,II_2:  in  BIT  :=  'O’; 

OUT.l:  out  BIT  :=  ’O’); 
end  AVD.GATE; 

architecture  SIMPLE  of  AID.GATE  is 
begin 

0UT_1  <=  II_1  and  II_2  after  gate_delay  ; 
end  SIMPLE  ; 


Entity  THREE_IIPUT_AID  is 

Port  (in_l,  in_2,  in_3  :  in  BIT  :=  'O';  out_l  :  out  BIT  :=  ’O’); 
Constant  Delay  :  Time  :  =  5  ns; 
end  THREE_IIPUT_AID; 

Architecture  BEHAV.3AID  of  THREE. IIPUT_ AID  is 
begin 

process  begin 
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OUT.l  <=  IH_1  and  II_2  and  IN_3  after  delay; 
wait  on  II_1,  IH_2,  II_3; 
end  process; 
end  BEHAV.3AMD; 


entity  GID_BOX  is 

Port  (GZ  :  Out  Bit  ); 
end  GID.BOX; 

architecture  BEHAVIORAL  of  GHD.BOX  is 

begin 

GZ  <=  ’O'; 
end  BEHAVIORAL; 


C.S  Structural  Connection  of  Logic  Gates. 

Circuits  are  built  hierarchically  in  entity /architecture  pairs  by  structurally  connecting  logic 
gate  components  or  other  structural  descriptions.  Assertions  can  be  raised  at  this  point.  An 
assertion  of  type  error  or  fatal  will  abort  the  simulation.  Component  port  maps  use  either 
named  or  positioned  notation  for  signal  assignments.  Bit  vectors  may  also  be  used.  Here  is  an 
example  of  an  SR  flip-flop  that  structurally  connects  two  nor  gates: 


entity  SRFF  is 

port  (S,R:  in  Bit; 

Q:  out  Bit; 

Q_Bar:  out  Bit); 

begin 

SRFF_Constraint .Check : 

assert  not  (S='l’  and  R=’l’) 

report  "Both  S  and  R  equal  to  ’1'" 
severity  Error; 

end  SRFF; 

architecture  Structured  of  SRFF  is 
component  A_IOR_Gate 
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port  (In_l,  In_2:  in  Bit;  Out_l:  out  Bit); 
end  component; 

signal  Q_Bar_In:  Bit; 
signal.  Q_In:  Bit; 

begin 

XI:  A_IOB_Gate  port  map  (R,Q_Bar_In,Q_In) ; 

X2:  A_SOR_Gate  port  map  (Q_In,S,Q_Bar_In) ; 

Q  <=  Q_In; 

Q_Bar  <=  Q_Bar_In; 
end  Structural; 


This  carry  save  adder  shows  positional  notation,  use  of  bit  vectors,  and  structural  connections 
of  both  gates  (inverters)  and  full  adders  which  are  structurally  defined  elsewhere: 


entity  CSA8  is 


A  :  In 

Bit_VECTOR  (7  dosnto 

0) 

B  :  In 

Bit_VECTOR  (7  dosnto 

0) 

C  :  In 

Bit.VECTOR  (7  dosnto 

0) 

HI_CSA_BIT  : 

In 

Bit  :=  ’O’; 

L0_CSA_BIT  : 

In 

Bit  :=  ’O’; 

CARRY  :  Out 

Bit_VECTOR  (7  dosnto 

0) 

HI_SUM_BIT  : 

Out  Bit  :=  ’O’; 

L0_SUM_BIT  : 

Out  Bit  :=  ’O’; 

SUM  :  Out 

Bit_VECTOR  (7  dosnto 

0) 

end  CSA8; 


"OOOOOOOO"; 
"OOOOOOOO" ; 
"OOOOOOOO"; 


"OOOOOOOO" ; 


"OOOOOOOO"  ); 


architecture  SCHEMATIC  of  CSA8  is 


signal  H_1  :  Bit; 
signal  V_2  :  Bit; 

component  IW_1 

Port  (In_l  :  In  Bit  :=  *0’; 

Out_l  :  Out  Bit  :=  *0’  ); 
end  component; 


component  FULL.AODER 

Port  (All  :  In  Bit  :=  ’O’; 

BII  :  In  Bit  :=  ’O’; 

CII  :  In  Bit  :=  ’O'; 
CARRY  :  Out  Bit  :=  ’O’; 
SOM  :  Out  Bit  :=  ’0’  ); 
end  component; 
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begin 


I_9  :  IBV_i 

Port  Map  (  In_l=>I_2,  Out_l=>LO_SUM_BIT  ); 

I_10  :  IIV_1 

Port  Map  (  In_l=>A(0),  0nt_l=>I_2  ); 

I_ll  :  IIV_1 

Port  Map  (  In_l=>H_l ,  Out_l=>HI_SUM_BIT  ); 

I_12  :  IIV_1 

Port  Map  (  In_l=>C(7) ,  Out_l=>H_l  ); 

I_1  :  FULL_ ADDER 

Port  Map  (  AII=>HI_CSA_BIT ,  BII=>B(7),  CII=>C(6),  CARRY=>CARBY (7) , 
SUM=>SUM(7)  ) ; 

I_2  :  FULL_ADDER 

Port  Map  (  AII=>A(7),  BII=>B(6) ,  CII=>C(5),  CARRY=>CARRY(6) , 
SUM=>SUM(6)  ); 

I_3  :  FULL_ADDER 

Port  Map  (  AII=>A(6),  BII=>B(S),  CII=>C(4),  CARRY=>CARRY(5) , 
SUM=>SUM(5)  ); 

I_4  :  FULL_ ADDER 

Port  Map  (  AIS=>A(5) ,  BII=>B(4),  CII=>C(3),  CARRY=>CARRY(4) , 
SUM=>SUM(4)  ); 

I_5  :  FULL_ADDER 

Port  Map  (  AII=>A(4),  BII=>B(3),  CII=>C(2),  CARRY=>CARRY(3) , 
SUM=>SUM(3)  ); 

I_6  :  FULL_ADDER 

Port  Map  (  AII=>A(3) ,  BII=>B(2),  CIJF=>C(1),  CARRY=>CARRY(2) , 
SUM=>SUM(2)  ); 

I_7  :  FULL_ADDER 

Port  Map  C  AIH=>A(2) ,  BII=>B(1),  CII=>C(0),  CARRY=>CARRY ( 1 ) , 
SUM=>SUM(1)  ); 

I_8  :  FULL_ ADDER 

Port  Map  (  AII=>A(1),  BII=>B(0),  CIf=>LO_CSA_BIT,  CARRY=>CARRY(0) , 
SUM=>SUM(0)  ); 

end  SCHEMATIC; 


C.3  Test  Bench  and  Input  Vectors. 

Teat  benches  are  used  to  connect  the  circuit  under  test  to  a  series  of  input  test  signals.  The 
inputs  may  be  of  type  bit  or  bit_vector;  however,  each  bit  of  a  bit  vector  must  be  assigned  values 
individually.  VSIM  terminates  after  2000  ns;  therefore,  no  input  signal  should  be  assigned  a  value 
beyond  2000  ns.  Here  is  an  example  of  a  test  bench  for  the  16-bit  bit/byte  shifter: 
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entity  Shifter.TB  is 
end  Shifter.TB; 

architecture  Structural  of  Shifter_TB  is 
component  Test_Circuit 

Port  (  SHIFTER.CQITROL  :  In  Bit_Vector  (2  downto  0) ; 
SHIFTER.IIPUT  :  In  Bit.Vector  (15  downto  0); 
SHIFTER_OUTPUT  :  Out  Bit .Vector  (15  downto  0)  ); 
end  component; 

signal.  Control:  Bit .Vector (2  downto  0); 
signal  Input:  Bit_Vector(15  downto  0); 
signal  Output:  Bit_Vector(15  downto  0); 

begin 

Circuit:  Test.Circuit  port  map  (Control,  Input,  Output); 

—  Use  Input  =  0101010101010101  after  10  ns,  then 
0000111100001111  after  250  ns. 


Input(O) 

<= 

'1' 

after 

10 

ns. 

'1' 

after 

250 

ns; 

Input (1) 

<= 

>o> 

after 

10 

ns. 

»1' 

after 

250 

ns; 

Input (2) 

<= 

'1' 

after 

10 

ns. 

*1* 

after 

250 

ns; 

Input (3) 

<= 

•0* 

after 

10 

ns. 

'1' 

after 

250 

ns; 

Input (4) 

*1* 

after 

10 

ns, 

•0’ 

alter 

250 

ns; 

Input (5) 

<= 

*0' 

after 

10 

ns, 

•  0' 

after 

260 

ns; 

Input (6) 

<= 

•v 

after 

10 

ns. 

•0' 

after 

250 

ns; 

Input (7) 

<= 

•O' 

after 

10 

ns. 

•  0' 

after 

250 

ns; 

Input (8) 

<= 

*1* 

after 

10 

ns. 

'1' 

after 

250 

ns; 

Input (9) 

<= 

•  0' 

after 

10 

ns. 

»1* 

after 

250 

ns; 

Input (10) 

<= 

*1» 

after 

10 

ns. 

'1' 

after 

250 

ns; 

Input (11) 

<= 

•0' 

after 

10 

ns, 

*1* 

after 

250 

ns; 

Input (12) 

<= 

'1' 

after 

10 

ns. 

•0' 

after 

250 

ns; 

Input  (13) 

<= 

•0' 

after 

10 

ns. 

•0' 

alter 

250 

ns; 

Input (14) 

<= 

*1' 

after 

10 

ns, 

•  0' 

after 

250 

ns; 

Input (15) 

<= 

•  0' 

after 

10 

ns , 

•  0' 

after 

250 

ns; 

-  Check  left  shift,  right  shift. 


— 

left  shift  8, 

right  shift  8, 

pass 

Control (0) 

<= 

'1' 

after 

20  ns. 

'0'  after  50  ns. 

*1* 

after 

100  ns, 

’  after  150  ns. 

•  0' 

after 

200 

ns. 

*1* 

after 

300  ns. 

*0*  after  350  ns. 

*1' 

after 

400  ns, 

'  after  450  ns. 

•  0' 

after 

500 

ns; 

Control(l) 

<= 

•0' 

after 

20  ns, 

'1'  after  50  ns. 

*1' 

after 

100  ns. 

’  after  150  ns. 

•  0' 

after 

200 

ns. 

•  0' 

after 

300  ns, 

'1'  after  350  ns. 

•1* 

after 

400  ns. 

’  after  450  ns. 

•o> 

after 

500 

ns; 

Control (2) 

<= 

'O' 

after 

20  ns. 

'0'  after  50  ns, 

*0' 

after 

100  ns. 

’  after  150  ns. 

'0' 

alter 

200 

ns. 
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end  Structural; 


O'  alter  350  ns, 

after  450  ns,  ’O'  after  500  ns; 


’0’  after  300  ns,  ' 
'O’  after  400  ns,  ’ 


C.4  Configuration  Descriptions. 

Configuration  specifications  are  used  to  bind  component  instances  to  design  entities.  Con¬ 
figurations  may  either  be  assigned  all  at  once  at  the  top  level,  or  at  each  intermediate  step  in 
hierarchical  fashion.  The  latter  saves  a  great  deal  of  file  space  with  respect  to  the  intermediate  C 
code;  this  increases  the  chances  that  large  circuits  will  compile  on  the  hypercubes  without  running 
out  of  memory. 

The  following  is  an  example  of  a  single  top-level  configuration  for  the  carry  lookahead  adder: 


use  WORK . TEST_CL_ADDER; 

Configuration  S_C0IF_CLA  of  TEST_CL_ADDER  is 
for  IISTAITIATE_CL_ADDER 

for  CLA  :  CARRY_L00KAHEAD .ADDER 

use  Entity  WORK . CARRY_L00KAHEAD_ADDER( STRUCT. CLA) ; 
for  STRUCT.CLA 

for  all  :  AID.GATE 

use  Entity  WORK.AVD.GATE(SIKPLE) ; 
end  for; 

for  all  :  THREE_IIPUT_AID 

use  Entity  WORK . THREE. IBPUT.AID (BEHAV.3AID) ; 
end  for; 

for  all  :  F0UR_IIPUT_AID 

use  Entity  WORK . FOUR. IIPUT_ AID (BEHAV_4AID) ; 
end  for; 

for  all  :  FIVE_I*PUT_AID 

use  Entity  W0RK.FIVE_IIPUT_AID(BEHAV_5AID) ; 
end  for; 

for  all  :  0R.GATE 

use  Entity  W0RK.0R_GATE(SIHPLE) ; 
end  for; 

for  all  :  THREE_IIPUT_0R 

use  Entity  WORK . THREE_IIPUT_0R(BEHAV_30R) ; 
end  for; 
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lor  all  :  FOUR_I*PUT_OR 

use  Entity  W0RK.F0UR_IIPUT_0R(BEHAV_40R) ; 
end  lor; 

lor  all  :  FIVE_IIPUT_OR 

use  Entity  W0RK.FIVE_IIPUT_0R(BEHAV_50R) ; 
end  lor; 

lor  all  :  XOR.GATE 

use  Entity  WORK.XOR_GATE(SIMPLE) ; 
end  lor; 
end  lor; 
end  lor; 
end  lor; 
end  S_CO*F_CLA; 


The  Wallace  tree  is  an  example  of  using  hierarchical  configurations.  First,  full  adders  are 
configured  with  logic  gates,  then  carry  save  adders  are  configured  with  full  adders  (and  more 
inverters),  etc.  While  this  method  has  the  benefit  of  smaller  intermediate  C  code,  the  postprocessor 
output  must  be  modified  as  explained  on  page  95  of  Appendix  B.  Here  are  the  Wallace  tree 
configuration  descriptions: 


coni igurat ion  CFG_FULL_ADDER  ol  Work . FULL, ADDER  is 
lor  SCHEMATIC 

lor  I_l,  I_2,  IIV_1 CARRY,  IIV_1A,  IfV.lB,  IIV_1C:  I*V_1 
use  entity  WORK.Inv(Sinple); 
end  lor; 

lor  IAIDBC,  IAVDCARRY,  I AID AC,  VAIDSUM,  I AID A B :  IAID.2 
use  entity  WORK. Iand_Gate( Simple) ; 
end  lor; 

lor  1AID3CARRY,  XAID30R,  IAXD3ABC :  1AID.3 

use  entity  WORK.Three_Input_Iand_Gate(Sinple) ; 
end  lor; 
end  lor; 

end  CFG_FULL_ ADDER; 

coni igurat ion  CFG_CSA8  ol  Work.CSA8  is 
lor  SCHEMATIC 

lor  I_9,  I_10,  I_ll,  I_12:  I«V_1 
use  entity  WORK.Inv(Siaple); 
end  lor; 
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for  I_l,  1.2,  I_3,  I_4,  I_5,  I_6,  1.7,  I_8:  FULL.ADDER 
use  configuration  WORK . CFG.FULL. ADDER; 
end  for; 
end  for; 

end  CFG.CSA8 ; 

configuration  CFG.WALLACE.TREE. 1  of  Work . WALLACE.TREE. 1  is 
for  SCHEMATIC 

for  1.17,  1.18,  1.19,  I_20,  1.21,  1.22,  1.23:  GID.BOX 
use  entity  WORK.Gnd.Box(Beliavioral) ; 
end  for; 

for  I.ll,  I_12:  FULL.ADDER 

use  configuration  WORK. CFG.FULL. ADDER; 
end  for; 

for  1.15,  1.16,  1.13,  1.14,  1.8,  1.9:  IIV.l 
use  entity  WORK. Inv (Simple); 
end  for; 

for  1.6,  1.4,  1.5,  1.2,  1.3:  CSA8 
use  configuration  WORK . CFG.CSA8 ; 
end  for; 

for  1.1 :  MCAID.GEW 

use  configuration  WORK . CFG.MCAID.GEI ; 
end  for; 
end  for; 

end  CFG.WALLACE.TREE. 1 ; 

configuration  CFG.WALLACE.TREE.2  of  Work . WALLACE.TREE.2  is 
for  SCHEMATIC 

for  1.26,  1.27,  1.25:  GID.BOX 

use  entity  WORK.GID.BOX(Belxavioral); 
end  for; 

for  1.21,  1.22:  IIV.l 

use  entity  WORK.Inv(Simple) ; 
end  for; 

for  1.23,  1.24,  1.20,  1.9,  1.10,  I.ll,  1.6,  1.12,  1.13,  1.14,  1.15, 
1.16,  1.17,  1.18,  1.19,  1.1 ,  1.2,  1.3,  1.4,  1.5:  FULL.ADDER 
use  configuration  WORK. CFG.FULL. ADDER; 
end  for; 

for  1.8:  WALLACE.TREE. 1 

use  configuration  WORK.CFG.WALLACE.TREE.l; 
end  for; 
end  for; 

end  CFG.WALLACE.TREE.2; 

configuration  CGF.Wallace.TB  of  Work.Wallace.TB  is 
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for  Structural 

for  Circuit:  Test_Circuit 

use  configuration  work . CFG_WALLACE_TREE_2 ; 
end  for; 
end  for; 

end  CGF_Wallace_TB ; 
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Appendix  D.  Design  of  the  Wallace  Tree  Multiplier 


The  Wallace  tree  multiplier  is  the  largest  circuit  simulated  with  VSIM  on  the  Intel  Hypercubes. 
It  is  created  and  verified  in  MVL-7  logic  using  Synopsis  design  tools.  For  AFIT  VSIM  simulations, 
MVL-7  bits  and  bit  vectors  are  changed  to  type  bit  and  bit_vector. 

The  hierarchical  design  of  the  multiplier  has  two  advantages.  First,  the  corresponding  inter¬ 
mediate  C  code  from  Intermetrics’  compiler  is  smaller  than  intermediate  C  code  for  an  equivalent 
large,  flat  circuit  description.  Second,  breaking  the  multiplier  into  hierarchical  components  pro¬ 
vides  logical,  concurrent  subcomponents  that  may  be  partitioned  among  the  nodes  of  a  parallel 
computer. 

The  design  is  taken  from  Hwang  and  Briggs  (19).  Figure  68  shows  the  multiplier  as  a  tree 
of  carry  save  adders  followed  by  a  carry  propagate  adder.  Two  eight  bit  numbers,  A  and  B,  are 
fed  into  a  multiplicand  generator  which  generates  intermediate  results  and  shifts  them  accordingly. 
These  results  go  through  the  series  of  carry  save  adders,  and  then  the  carry  propagate  adder  where 
the  twelve  bit  product,  P,  is  generated. 

The  VHDL  hierarchy  is  shown  in  Figure  69.  The  overall  circuit,  wallace_tree_2,  consists  of 
wallace_tree_l  and  a  set  of  full  adders  that  make  the  carry  propagate  adder.  The  wallace_tree_l 
description  includes  the  multiplicand  generator  and  the  carry  save  adders.  In  turn,  the  carry  save 
adders  are  made  with  full  adders.  All  full  adders  are  composed  of  nand  gates  and  inverters. 

The  schematics  for  all  components  are  as  follows:  wallace.tree.2  is  shown  in  Figure  70,  wal- 
lace_tree_l  is  shown  in  Figure  71,  the  multiplicand  generator  and  two  “subgenerators”  are  shown 
in  Figures  72  and  73,  the  carry  save  adder  is  shown  in  Figure  74,  and  the  full  adder  is  shown  in 
Figure  75. 
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P  =  A  x  B 


Figure  68.  Wallace  Tree  Multiplier 


Figure  69.  Hierarchy  of  VHDL  Source  Code  for  the  Wallace  Tree  Multiplier 
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Figure  70.  Top  Level  Schematic  of  Wallace  Tree  (wallace_tree_2) 
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Figure  71.  Schematic  of  Carry  Save  Adder  Tree  (wallace_tree_l) 


Figure  72.  Multiplicand  Generator 
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Figure  73.  Multiplicand  Subgenerator 


Figure  74.  Carry  Save  Adder  used  in  Wallace  Tree  Multiplier 
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Figure  75.  Full  Adder  used  in  Wallace  Ttee  Multiplier 


Appendix  E.  Summary  of  Performance  Data 


Each  simulation  configuration  is  summarized  in  Tables  9  and  10.  The  times  reported  corre¬ 
spond  to  the  average  execution  time  of  30  simulations  per  configuration.  The  reported  speedups 
correspond  to  the  simulation  time  of  the  slowest  LP,  neglecting  the  overhead  of  initializing  and 
closing  each  process.  For  example,  if  a  two-LP  simulation  is  run  and  LPO  reports  a  time  of  50ns 
while  LP1  reports  a  time  of  53ns,  the  time  for  the  simulation  is  considered  to  be  53ns.  Speedups 
are  related  to  one  LP  simulations  of  otherwise  identical  configurations. 
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Table  9.  Summary  of  Performance  Data 

Circuit  |  Hypercube  |  LPs  Nodes  Input  Vectors  Time  (ms)  |  Std  Dev  Min  Max  I  Speedup 
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Appendix  F.  New  Postprocessor  Steps 


The  postprocessor  modifies  the  intermediate  C  code  using  the  10  steps  Comeau  described  in 
his  thesis  (10),  as  well  as  two  new  steps.  The  first  new  step  is  to  delete  the  following  unnecessary 
function  calls: 


•  close_sigdict() 

•  m_int_type() 

•  m_real_type() 

•  m_real_type() 

•  m_signal() 

•  pop() 

•  pushO 

•  r ead_ input () 

•  rmtrrecO 

•  rptstatsO 

•  rpterrO 

•  Start_Nonarray_Comp() 

•  schedO 

•  timerO 

•  tpop() 

The  second  new  step  is  to  modify  every  behavior  instance’s  “function  behavior”  to  report  it’s 
entity /architecture  name  if  MAPPING  is  defined  in  VSIM  and  the  boolean  variable  mapping  is  true. 
Each  of  these  function  declarations  is  of  the  form  Zxxxxxxx.xxxx(bi).  Inside  the  function,  after 
local  declarations,  the  following  is  added: 


#ifdef  MAPPING 
if (mapping) 

printf  ("*/,s\n" ,  Zxxxxxxx.xxxx.trcbck) ; 

#endif 
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Here  is  an  example  of  a  behavior  instance  function  declaration  prior  to  adding  the  new  code: 


static  void 
Z000002T_4440(bi) 

BHP  bi; 

{ 

Z000002T_4112_struct  *cd  = 
(Z000002T_4112_struct  *)bi->data; 


And  here  is  the  modified  behavior  instance  function: 


static  void 
Z000002T_4440(bi) 

BHP  bi; 

{ 

Z000002T_4112_Btruct  *cd  = 
(Z000002T_4112_struct  *)bi->data; 

#ifdef  NAPPIVG 
if (mapping) 

printf  ("•/.8\n" ,  Z000002T_4440_trcbck) ; 

#endif 
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Appendix  G.  Key  Source  Code 


This  Appendix  describes  some  of  the  key  source  code  necessary  to  implement  parallel  VHDL 
simulations  with  SPECTRUM.  The  code  presented  concerns  interfacing  VSIM  and  SPECTRUM. 
It  is  important  to  recognize  that  an  event  is  logically  equivalent  to  a  signal  change  that  is  passed 
from  one  LP  to  another.  A  complete  code  listing  is  presented  in  a  second  volume. 


G.l  vspecJnit(). 

This  routine  builds  a  table  of  function  pointers  for  SPECTRUM.  Each  function  pointer  rep¬ 
resents  the  starting  code  for  the  simulation  on  each  LP.  For  VSIM,  all  LPs  start  with  the  routine 
startup().  Therefore,  every  entry  in  functions^  is  loaded  with  the  address  of  startup  f).  Also,  a 
call  to  read-mapping ()  is  made  so  VSIM  can  determine  which  LPs  are  assigned  which  behaviors. 
Finally,  a  call  is  made  to  SPECTRUM’S  lpJevelSnit(),  where  SPECTRUM  initializes  and  each  LP 
calls  startup().  Here  is  the  code  for  vspec.init( ),  which  is  found  in  the  file  vspec.c: 


void  vspec_init() 

{ 

void  ( *f unct ions [IUM_PROCS] ) ( ) ; 
char  *args [HUH_PR0CS] ; 
char  *argument; 
int  i; 

/*  initialize  function  pointers  and  lp  #s  as  arguments  */ 
for  (i  =  0;  i  <  IUM_PR0CS;  i++)  { 
functions [i]  =  startup; 

argument  =  (char  *)malloc(5*sizeof (char)); 
sprintf (argument,  "*/,d",  i); 
args [i]  =  argument; 

> 

read.mappingO ;  /*  read  in  lpx.map  file  */ 

lp_level_init (functions,  args); 

> 
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G.2  startup  (). 


This  routine,  also  found  in  vspec.c,  is  called  by  SPECTRUM  after  initialization.  SPECTRUM 
passes  each  LP  its  LP  number  through  startup(),  and  startup()  calls  lp.intt()  so  SPECTRUM  can 
initialize  the  VSIM  filters.  Finally,  vkdLmain()  is  called  in  the  intermediate  C  code  so  the  circuit 
may  be  constructed.  The  source  code  for  startup()  is  as  follows: 

void  startup(lp_no ) 
chau:  *lp_no; 

sscanf  (lp.no,  "*/.d"t  toy_lp)  ;  /*  set  global  my_lp  */ 

free (lp.no) ; 

lp_init(my_lp) ;  /*  set  up  filter  tables  */ 

vhdl.mainO ; 

> 


G.3  sendsignalQ. 


This  routine,  also  in  the  file  vspec.c,  is  used  to  build  an  event  out  of  a  signal  record,  and  call 
SPECTRUM’S  lpjpost.event(),  as  follows: 


void  send_signal(this_signal_rec,  dest) 

SIG_R.EC  *this_signal_rec; 
int  dest; 

i 

struct  event  *new_event ; 


new_event  =  (struct  event  *)malloc(sizeof (struct  event)); 

ne«_event  ->  from_lp  =  my_lp; 

new_ event  ->  to_lp  =  dest; 

new. event  ->  time  =  *sia_time; 

new. event  ->  event  =  SIGIAL. CHARGE; 

new.event  ->  id  =  this.signal.rec  ->  sr.ptr  ->  id; 
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new_event  ->  value  =  this_signal_rec  ->  value; 
new_event  ->  next  =  HULL; 

lp_P°st_event(new_event) ; 
free(new_event) ; 


> 


G-4  receivesignal(). 

This  function  passes  received  events  from  other  LPs  (via  SPECTRUM)  to  VSIM.  First,  a  call 
to  SPECTRUM  is  made  in  lp.gei.eveni().  This  also  activates  the  receive  filter,  shown  later.  If  the 
filter  passes  receivejsxgnal()  a  null  pointer,  then  it  returns  to  VSIM  without  posting  a  signal  record 
into  the  local  active  list.  Otherwise,  the  newly  received  event  is  converted  into  a  signal  record  and 
posted  directly  into  the  active  list.  This  function,  shown  below,  is  found  in  the  file  vspec.c. 


void  receive_signal() 

struct  event  *event; 

SIG.REC  *new_sig_rec; 

SRP  signal; 
int  value; 
int  time; 

event  =  lp_get_event() ; 

if  (event  !=  HULL)  { 

signal  =  srrec_ptr [event  ->  id]; 
value  =  event  ->  value; 
time  =  event  ->  time; 
MAKE_SIG_REC( signal,  value,  time); 

insert_sig_rec(new_sig_rec) ; 
free (event) ; 

> 


> 
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G.5  nulLpost.fltr(). 


This  is  the  filter  used  when  VSIM  sends  an  event  to  another  LP.  The  filter  is  logically  equiv¬ 
alent  to  AFIT’s  chanclocks  post  filter.  For  VSIM,  the  filter  tracks  the  times  a  message  was  sent  on 
each  output  arc.  Also,  when  an  event  is  sent  to  one  LP,  this  filter  sends  a  null  message  to  all  other 
output  arcs.  The  post  filter,  found  in  the  file  vfilt.c,  is  follows: 


void  null_post_fltr() 

{ 

int  i; 

/*  update  output  channel  time  for  this  message  */ 
if  ( event _to_post  ->  event  !=  HULL.MSG  kk  HUM_OUT_LPS  >  0)  { 
for  (i  =  0;  i  <  IUM_0UT_ARCS ;  i++)  { 

if  (0UT_ARCS(i)  ==  event_to_post  ->  to_lp) 
output_ctime[i]  =  event _to_poat  ->  time; 

> 

> 

/*  send  nulls  to  other  lps  */ 
if  ( event _to_post->to_lp  !=  my_lpid)  { 
for  (i  =  0;  i  <  IUM_OOT_ARCS ;  i++){ 

if  (0UT_ARCS(i)  ! =  event_to_post  ->  to_lp) 

send_null(OUT_ARCS(i) ,  event_to_post  ->  time); 

> 

> 


G.6  ableJo.proceed(). 


This  routine,  found  in  the  file  vfilt.c,  determines  (1)  at  least  one  message  has  been  received 
from  every  upstream  LP,  and  (2)  the  next  event  in  SPECTRUM’S  input  queue  is  less  than  the  safe 
time,  as  follows: 


BOOL  able_to_proc««d() 
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{ 

if  (event_list  ==  HULL) 
return  FALSE; 

/*  if  haven’t  yet  received  a  message  from  everybody  */ 
if  (safetimeO  ==  -1) 
return  FALSE; 

/*  if  still  may  get  an  earlier  message  */ 
if  (event.list  ->  time  >  safetimeO) 
return  FALSE; 

return  TRUE; 

> 


G.  7  safeiime(). 

This  routine  determines  the  minimum  input  channel  time  for  all  input  arcs.  It  is  found  in 
vfi.lt.  c,  and  lists  as  follows: 


int  safetimeO 

< 

int  min_input_ctime  =  input _ctime[03 ; 
int  i; 

for  (i  =  1;  i  <  *UM_II_ARCS;  i++)  i 

if  ( input _ctime[i]  <  min_input_ctime) 
min_ input _ctime  =  input_ctime[i] ; 

> 

return  (min_input_ctioe) ; 

> 


G.8  send.nulls(). 

This  function  is  used  to  send  null  messages  to  all  downstream  LPs  prior  to  blocking  for 
incoming  messages  (explained  below  in  nulLget.fltr()).  The  time  stamp  of  the  null  messages  is  the 
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minimum  of  (1)  the  iow  time”  of  the  active  list  in  VSIM,  ant.  (2)  the  safe  time  plus  the  output 
delay  for  the  local  LP.  The  code,  found  in  vfilt.c ,  is  as  follows: 


void  send_nulls() 

{ 

int  i; 

int  safe_time  =  safetimeO; 

int  vhdl_low_time  =  get_low_time() ; 

lor  (i  =  0;  i  <  HUM_0UT_ARCS;  i++) 
il  (0UT_ARCS(i)  !=  ny.lpid) 

il  (vhdl_lo0_time  <  sal«_time  +  LP_OUT_DELAYS(i)) 
send_null(OUT_ARCS(i) ,  vhdl_los_time) ; 
else 

send_null(OUT_ARCS(i),  sale_tim®  +  LP_0UT_DELAYS(i) ) ; 

> 


G.9  nulLget-fltr(). 

This  is  the  filter  used  when  receiving  events  from  upstream  LPs.  First,  if  there  are  no  upstream 
LPs,  the  function  simply  returns  and  VSIM  continues.  Otherwise,  the  filter  gets  a  message  (if 
ableJo.proceedf )),  and  returns  it  to  VSIM  if  it  is  not  a  null  message.  The  event  is  passed  to  VSIM 
by  removing  it  from  SPECTRUM’S  input  queue  and  assigning  it  to  SPECTRUM’S  global  variable 
called  current.evcni.  If  an  event  is  not  ready  (not  able.to.proceed()),  then  the  filter  determines  if 
VSIM  can  continue  without  an  event,  i.e. ,  if  VSIM’s  low  time  is  less  them  or  equal  to  the  safe  time. 
The  code,  found  in  vfilt.c,  is  as  follows: 

void  null_get_lltr() 

{ 

int  vhdl_low_time  =  get_low_time() ; 

BOOL  lound  =  FALSE; 

EVEKT  *t«np; 

il  (IUM_II_LPS  ==  0) 
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return; 


while  (Ifound)  { 

if  (able_to_proceed())  { 

if  (event _list->event  !=  HULL_HSG) 
found  =  TRUE; 
else  { 

temp  =  event_list; 
event_list  =  event _list->next ; 
node_trash_ event (temp) ; 

> 

> 

else  { 

if  (vhdl_low_time  <=  safetimeO) 
return; 
else  { 

send_nulls() ; 

while  ( !able_to_proceed()) 
node_block_til_message() ; 

> 

> 


current_event  =  event_list; 
event_li3t  =  event_list->next; 
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